A Reference Architecture for

Distributed Software

Deployment

PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Technische Universiteit Delft,

op gezag van de Rector Magniﬁcus prof. ir. K.C.A.M. Luyben,

voorzitter van het College voor Promoties,

in het openbaar te verdedigen

op maandag 3 juni 2013 om 15:00 uur door

Sander VAN DER BURG

ingenieur informatica

geboren te Numansdorp

Dit proefschrift is goedgekeurd door de promotor:

Prof. dr. A. van Deursen

Copromotor: Dr. E. Visser

Samenstelling promotiecommissie:

Rector Magniﬁcus voorzitter

Prof. dr. A. van Deursen Delft University of Technology, promotor

Dr. E. Visser Delft University of Technology, copromotor

Prof. dr. R. di Cosmo University of Paris-Diderot

Prof. dr. ir. D. H. J. Epema Delft University of Technology

Eindhoven University of Technology

Prof. dr. Y. Tan Delft University of Technology

Dr. E. Dolstra LogicBlox Inc.

Dr. S. Malek George Mason University

The work in this thesis has been carried out at the Delft University of Tech-

nology, under the auspices of the research school IPA (Institute for Program-

ming research and Algorithmics). The research was ﬁnancially supported by

the Netherlands Organisation for Scientific Research (NWO)/Jacquard project

638.001.208, PDS: Pull Deployment of Services.

Cover: Lions Gate Bridge – Vancouver, British Columbia, Canada

Printed and bound in The Netherlands by CPI Wöhrmann Print Service.

ISBN 978-94-6203-335-1

Preface

I vividly remember the ﬁrst PhD defence ceremony that I have attended of

a fellow villager, which was coincidentally also at Delft University of Tech-

nology. At the same time, I just started studying for a master’s degree in

Computer Science at the same university. Although the ceremony made a

lasting impression on me, I concluded that becoming a PhD student was a

career step I would deﬁnitely never make – however, it turned out that I was

wrong.

I have always had a deep interest in programming and software technology

from a very young age. For example, I wrote my very ﬁrst computer program

in the BASIC programming language on a Commodore 64.

At some point, I became interested in do-it-yourself (DIY) construction of

Linux distributions, also known as a project called Linux from Scratch (LFS)

as I could never get satisﬁed with any of the common Linux distributions.

The Linux from Scratch (LFS) project basically provides a book describing

how to construct your own Linux system using nothing but source packages.

Although it was fun to get a fully customised system working, it turned out

that maintaining and upgrading such customised systems was complicated,

time consuming and a big burden.

In the ﬁnal year of my master’s study, I met Eelco Vissor, my co-promotor.

Initially, I was looking for a Model Driven Engineering (MDE) thesis project,

but after I told him about my interest in Linux from Scratch, he proposed

me to take a look at a different project of a former PhD student of him (Eelco

Dolstra), who had developed a package manager called Nix with very uncom-

mon, yet very powerful features.

In the beginning, I was sceptic about Nix. However, when I read Eelco

Dolstra’s PhD thesis I learned more about the concepts and started liking

it more and more. I also got involved with the rest of the Nix community

that had already attracted quite some developers from various places over the

world. Eventually, I have decided to focus myself on using Nix and to change

NixOS (a Linux distribution based on Nix) to have the features that I wanted,

instead of doing everything myself.

Moreover, my master’s thesis project was also a collaboration with an in-

dustry partner, namely Philips Research. At Philips, I met Merijn de Jonge

who had been involved with the early stages of the development of Nix. As

part of my master thesis project, I have developed the ﬁrst Disnix prototype,

allowing developers to deploy service-oriented systems in network of ma-

chines, with a Philips system as a case study.

Although the initial prototype showed us a number of promising results, it

also raised many additional questions that had to be answered. Fortunately,

Philips and our research group had been working on a research project pro-

http://www.linuxfromscratch.org

iii

posal, called the Pull Deployment of Services (PDS) project. When I was

approached to become a PhD researcher in the PDS project, it was simply an

offer I could not refuse. Therefore, I broke the promise that I had made to

myself years ago.

A B O U T T H I S T H E S I S

This thesis is the result of the work I have been doing for the last 4 years (or

perhaps a bit longer). In the early stages of my research I had a very good

start, getting a research paper accepted already in the ﬁrst month. However,

as with many PhD research projects, you have your ups (when your papers

get accepted quite easily after peer reviewing) and downs (when papers get

rejected for reasons that are not obvious at all).

However, one of things my promotor Arie van Deursen said about peer

reviewing is: “Acceptance of a paper is nice, but rejections are there for you

to learn”. One comment of a particular reviewer that really kept me thinking

was the difference between engineering and science, since we as researchers

are supposed to focus ourselves on the latter aspect, although our ﬁeld of

research lies within the former. Luckily, I was able to ﬁnd a deﬁnition in the

literature:

Research in engineering is directed towards the efﬁcient accomplishment of spe-

ciﬁc tasks and towards the development of tools that will enable classes of tasks

to be accomplished more efﬁciently. [Wegner, 1976]

The open question that remains is: How can that vision be accomplished?

In this thesis, I have been trying to do that by describing the research that I

have done, but also by giving relevant background information and describ-

ing how a technique can be applied and implemented. Often, adoption of

research in practice lies in the details and the way the knowledge is trans-

ferred.

I also had to make several “compromises” as opinions differ among the

people I have worked with on a few tiny details. I even had to make a very

big compromise with myself during this project. People that know me and

have read this thesis, will probably notice the two conﬂicting aspects I am

referring to, but I leave that as an open exercise to the reader.

F U RT H E R I N F O R M AT I O N

Apart from this thesis and the publications that have contributed to it, the

tools developed as part of this thesis are available as Free and Open-Source

software through the Nix project

. Moreover, readers that are more interested

in the Nix project or any of its core concepts, may also ﬁnd it interesting to

read Eelco Dolstra’s PhD thesis [Dolstra, 2006].

http://nixos.org

A C K N O W L E D G E M E N T S

This thesis could not have come about without the help of many people. Fore-

most, I wish to thank Eelco Dolstra, the author of Nix. He was involved in

nearly every aspect of this thesis and he was always sharp enough to spot

minor inconsistencies in my writings that I overlooked. Moreover, I wish to

thank my co-promotor Eelco Visser, who gave me the opportunity to do this

research. His guidance and advice was most helpful. My promotor Arie van

Deursen helped me with many general aspects of research, such as putting a

contribution into the right context and how to critically look at scientiﬁc work.

Another name I should deﬁnitely mention is Merijn de Jonge, who wrote my

project proposal and was heavily involved in the early stages of my research

project. He gave me the inspiration to look at architectural concepts.

I would also like to thank the members of my reading committee, Roberto

di Cosmo, Sam Malek, Dick Epema, and Yao-hua Tan for reviewing my thesis.

They gave me some insights I would not have previously thought of, and I am

very happy that I was able to incorporate the majority of these new insights

into this thesis.

Other important people that have contributed to this thesis are Daniel M.

German, Julius Davies and Armijn Hemel. Together, we wrote the paper that

contributed to Chapter 10 by combining our efforts in both software deploy-

ment and license compliance engineering.

Of course, there are many others that have contributed to my efforts in

some way or another. Rob Vermaas, a fellow Nix-er and ofﬁce mate (me, Eelco

Dolsta and him shared the same room) did many practical contributions to

Nix and many of the Nix subprojects. The three of us made a very interesting

expedition on the island of O’ahu (part of Hawaii) in a SUV.

Moreover, the other members of our SLDE subgroup were also most help-

ful. Our projects have some overleap on a practical side and we always had

very enjoyable daily stand up meetings in the coffee corner. Therefore, I

would like to thank Danny Groenewegen, Vlad Vergu, Gabriël Konat, Marin

Bravenboer (from whom I borrowed the template for this thesis), Maartje de

Jonge, Lennart Kats (who gave me a lot of advice on the practical details of

publishing a thesis), Sander Ver molen, and Zef Hemel. Coincidentally, the

latter four persons including me all seem to ﬁt in my tiny car.

There are also various people at Philips, our project’s industry partner, that

I would like to thank. Erwin Bonsma was the developer responsible for the

build infrastructure of the Philips medical imaging platform and helped me

a lot with some practical aspects. I would like to thank Nico Schellingerhout,

Jan van Zoest, and Luc Koch for their continuous interest in my work and

giving me the opportunity to conduct experiments at Philips Healthcare. Wim

van der Linden was one of the developers of the SDS2 case study, who was

always willing to give me some technical advice when I needed it.

The #nixos IRC channel was a meeting point in Cyberspace where I got in

touch with many more people contributing to Nix from all over the world that

I all would like to thank. Particularly, Ludovic Courtès, Shea Levy, Nicolas

Preface v

Pierron and Lluís Battle i Rossell were developers I have been frequently in

touch with and gave me a lot inspiration that contributed to several parts of

this thesis.

I can also tell my readers that you should never underestimate the power

of secretaries. Both Esther van Rooijen (from our university) and Cora Baltes-

Letsch (from Philips) were very helpful in arranging many practical aspects

of my employment that saved me a lot of time. Besides my direct colleagues,

contacts and project partners, there have been many more colleagues from

our research group that have contributed to my work in some way or another,

either by just having a look at what I was doing or by having a discussion. I

have used Alberto González Sánchez thesis cover design as the source of the

layout for my cover. Thanks everyone!

Finally, I would like to thank my parents Hennie and Truus, and my brother

Dennis for their continuous support. Sometimes, when you are so busy doing

“complicated” things, you will get sloppy when it comes to simple things in

life. They have been most helpful in supporting me with the latter aspect.

Sander van der Burg

May 06, 2013

Numansdorp

Contents

I Introduction 1

1 Introduction 3

1.1 Software deployment complexity . . . . . . . . . . . . . . . . . . 4

1.1.1 Early history . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 Operating systems and high level languages . . . . . . . 5

1.1.3 Component-based systems . . . . . . . . . . . . . . . . . 6

1.1.4 Service-oriented systems and cloud computing . . . . . 8

1.2 Hospital environments . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.2 Device-orientation . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Service-orientation . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.3 Agility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.4 Genericity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7 Outline of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.8 Origin of chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Background: Purely Functional Software Deployment 23

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 The Nix package manager . . . . . . . . . . . . . . . . . . . . . . 24

2.2.1 The Nix store . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.2 Specifying build expressions . . . . . . . . . . . . . . . . 25

2.2.3 Composing packages . . . . . . . . . . . . . . . . . . . . . 27

2.2.4 Building packages . . . . . . . . . . . . . . . . . . . . . . 28

2.2.5 Runtime dependencies . . . . . . . . . . . . . . . . . . . . 30

2.2.6 Nix proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.7 Atomic upgrades and rollbacks . . . . . . . . . . . . . . . 33

2.2.8 Garbage collection . . . . . . . . . . . . . . . . . . . . . . 33

2.2.9 Store derivations . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 NixOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3.1 NixOS conﬁgurations . . . . . . . . . . . . . . . . . . . . 36

2.3.2 Service management . . . . . . . . . . . . . . . . . . . . . 37

2.3.3 NixOS modules . . . . . . . . . . . . . . . . . . . . . . . . 39

2.3.4 Composing a Linux system . . . . . . . . . . . . . . . . . 41

vii

2.4 Hydra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4.1 Deﬁning Hydra build jobs . . . . . . . . . . . . . . . . . . 43

2.4.2 Conﬁguring Hydra build jobs . . . . . . . . . . . . . . . 44

2.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.5.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.5.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 A Reference Architecture for Distributed Software Deployment 49

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 Reference architectures . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.1 Functional requirements . . . . . . . . . . . . . . . . . . . 50

3.3.2 Non-functional requirements . . . . . . . . . . . . . . . . 52

3.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.1 Nix: Deploying immutable software components . . . . 54

3.4.2 NixOS: Deploying complete system conﬁgurations . . . 54

3.4.3 Disnix: Deploying services in networks of machines . . 54

3.4.4 Dynamic Disnix: Self-adaptive deployment of services

in networks of machines . . . . . . . . . . . . . . . . . . . 54

3.4.5 DisnixOS: Combining Disnix with complementary in-

frastructure deployment . . . . . . . . . . . . . . . . . . . 55

3.4.6 Dysnomia: Deploying mutable software components . . 55

3.4.7 grok-trace: Tracing source artifacts of build processes . . 55

3.5 Architectural patterns . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5.1 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.5.2 Purely functional batch processing style . . . . . . . . . 56

3.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.6.1 Hydra: Continuous building and integration . . . . . . . 58

3.6.2 webdsldeploy: Deploying WebDSL applications in net-

works of machines . . . . . . . . . . . . . . . . . . . . . . 58

3.7 An architectural view of the reference architecture . . . . . . . . 59

3.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

II Service Deployment 63

4 Distributed Deployment of Service-Oriented Systems 65

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 Motivation: SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 70

4.2.5 Deployment process . . . . . . . . . . . . . . . . . . . . . 71

4.3 Disnix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

viii

4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3.2 Building a service . . . . . . . . . . . . . . . . . . . . . . 75

4.3.3 Composing intra-dependencies . . . . . . . . . . . . . . . 77

4.3.4 Services model . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.5 Infrastructure model . . . . . . . . . . . . . . . . . . . . . 80

4.3.6 Distribution model . . . . . . . . . . . . . . . . . . . . . . 82

4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.4.1 Deployment process . . . . . . . . . . . . . . . . . . . . . 82

4.4.2 Disnix Service . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.4.3 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.4 Additional tools . . . . . . . . . . . . . . . . . . . . . . . 91

4.5 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.7.1 Component-speciﬁc . . . . . . . . . . . . . . . . . . . . . 93

4.7.2 Environment speciﬁc . . . . . . . . . . . . . . . . . . . . . 94

4.7.3 General approaches . . . . . . . . . . . . . . . . . . . . . 94

4.7.4 Common practice . . . . . . . . . . . . . . . . . . . . . . . 96

4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Self-Adaptive Deployment of Service-Oriented Systems 97

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.2.1 StaffTracker (PHP/MySQL version) . . . . . . . . . . . . 98

5.2.2 StaffTracker (Web services version) . . . . . . . . . . . . 99

5.2.3 ViewVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2.4 SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.3 Annotating Disnix models . . . . . . . . . . . . . . . . . . . . . . 101

5.3.1 Services model . . . . . . . . . . . . . . . . . . . . . . . . 102

5.3.2 Infrastructure model . . . . . . . . . . . . . . . . . . . . . 103

5.4 Self-adaptive deployment . . . . . . . . . . . . . . . . . . . . . . 103

5.4.1 Infrastructure generator . . . . . . . . . . . . . . . . . . . 105

5.4.2 Infrastructure augmenter . . . . . . . . . . . . . . . . . . 106

5.4.3 Distribution generator . . . . . . . . . . . . . . . . . . . . 107

5.5 Implementing a quality of service model . . . . . . . . . . . . . 108

5.5.1 Candidate host selection phase . . . . . . . . . . . . . . . 108

5.5.2 Division phase . . . . . . . . . . . . . . . . . . . . . . . . 111

5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.6.1 StaffTracker (PHP/MySQL version) . . . . . . . . . . . . 113

5.6.2 StaffTracker (Web services version) . . . . . . . . . . . . 113

5.6.3 ViewVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.6.4 SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.6.5 General observations . . . . . . . . . . . . . . . . . . . . . 114

5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Contents ix

III Infrastructure Deployment 119

6 Distributed Infrastructure Deployment 121

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121

6.2 Convergent deployment: Cfengine . . . . . . . . . . . . . . . . . 123

6.2.1 Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.2.2 Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.3 Other deployment approaches . . . . . . . . . . . . . . . . . . . 126

6.4 Congruent deployment . . . . . . . . . . . . . . . . . . . . . . . . 127

6.4.1 Specifying a network conﬁguration . . . . . . . . . . . . 127

6.5 Implementation of the infrastructure deployment process . . . 130

6.6 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.7 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.8 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.10 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7 Automating System Integration Tests for Distributed Systems 135

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.2 Single-machine tests . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.2.1 Specifying and running tests . . . . . . . . . . . . . . . . 137

7.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 139

7.3 Distributed tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7.3.1 Specifying networks . . . . . . . . . . . . . . . . . . . . . 140

7.3.2 Complex topologies . . . . . . . . . . . . . . . . . . . . . 142

7.4 Service-oriented tests . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.4.1 Writing test speciﬁcations . . . . . . . . . . . . . . . . . . 144

7.4.2 Deploying a virtual network . . . . . . . . . . . . . . . . 146

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.5.1 Declarative model . . . . . . . . . . . . . . . . . . . . . . 147

7.5.2 Operating system generality . . . . . . . . . . . . . . . . 148

7.5.3 Test tool generality . . . . . . . . . . . . . . . . . . . . . . 148

7.5.4 Distributed coverage analysis . . . . . . . . . . . . . . . . 149

7.5.5 Continuous builds . . . . . . . . . . . . . . . . . . . . . . 150

7.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

IV General Concepts 155

8 Atomic Upgrading of Distributed Systems 157

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.2 Atomic upgrading in Nix . . . . . . . . . . . . . . . . . . . . . . 158

8.2.1 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.2.2 Proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.2.3 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.3 Distributed atomic upgrading . . . . . . . . . . . . . . . . . . . . 160

8.3.1 Two phase commit protocol . . . . . . . . . . . . . . . . . 160

8.3.2 Mapping Nix deployment operations . . . . . . . . . . . 161

8.4 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.4.1 Supporting notiﬁcations . . . . . . . . . . . . . . . . . . . 162

8.4.2 The Hello World example . . . . . . . . . . . . . . . . . . 163

8.4.3 Adapting the Hello World example . . . . . . . . . . . . 163

8.4.4 Other examples . . . . . . . . . . . . . . . . . . . . . . . . 165

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

9 Deploying Mutable Components 169

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

9.2 The Nix deployment system . . . . . . . . . . . . . . . . . . . . . 170

9.3 Disnix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

9.4 Mutable components . . . . . . . . . . . . . . . . . . . . . . . . . 173

9.5 Dysnomia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9.5.1 Managing mutable components . . . . . . . . . . . . . . 173

9.5.2 Identifying mutable components . . . . . . . . . . . . . . 174

9.5.3 Extending Disnix . . . . . . . . . . . . . . . . . . . . . . . 175

9.6 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

9.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

9.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

9.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

10 Discovering License Constraints using Dynamic Analysis of Build

Processes 181

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

10.2 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . 182

10.3 General Approach: Tracing Build Processes . . . . . . . . . . . . 184

10.3.1 Reverse-Engineering the Dependency Graph . . . . . . . 184

10.3.2 Why We Cannot Use Static Analysis . . . . . . . . . . . . 185

10.3.3 Dynamic Analysis through Tracing . . . . . . . . . . . . 187

10.4 Method: Tracing the Build Process through System Calls . . . . 187

10.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

10.4.2 Producing the Trace . . . . . . . . . . . . . . . . . . . . . 189

10.4.3 Producing the Build Graph . . . . . . . . . . . . . . . . . 190

10.4.4 Using the Build Graph . . . . . . . . . . . . . . . . . . . . 193

10.4.5 Coarse-grained Processes . . . . . . . . . . . . . . . . . . 193

10.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

10.5.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 194

10.5.2 Usefulness . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

10.6 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

10.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Contents xi

V Applications 201

11 Declarative Deployment of WebDSL applications 203

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

203

11.2 WebDSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

11.3 Examples of using the WebDSL language . . . . . . . . . . . . . 205

11.3.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 206

11.3.2 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

11.3.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

11.4 Deploying WebDSL applications . . . . . . . . . . . . . . . . . . 208

11.4.1 Deploying the WebDSL compiler . . . . . . . . . . . . . . 208

11.4.2 Building a WebDSL application . . . . . . . . . . . . . . 208

11.4.3 Deploying WebDSL infrastructure components . . . . . 209

11.5 webdsldeploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

11.5.1 Single machine deployment . . . . . . . . . . . . . . . . . 209

11.5.2 Distributing infrastructure components . . . . . . . . . . 210

11.5.3 Implementing load-balancing . . . . . . . . . . . . . . . . 210

11.5.4 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

11.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

11.6.1 Building and conﬁguring a WebDSL application . . . . . 212

11.6.2 Generating a logical network speciﬁcation . . . . . . . . 214

11.6.3 Conﬁguring WebDSL infrastructure aspects . . . . . . . 216

11.7 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

11.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

11.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

12 Supporting Impure Platforms: A .NET Case Study 219

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

12.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

12.3 Global Assembly Cache (GAC) . . . . . . . . . . . . . . . . . . . 220

12.4 Deploying .NET applications with Nix . . . . . . . . . . . . . . . 221

12.4.1 Using Nix on Windows . . . . . . . . . . . . . . . . . . . 221

12.4.2 Build-time support . . . . . . . . . . . . . . . . . . . . . . 221

12.4.3 Run-time support . . . . . . . . . . . . . . . . . . . . . . . 223

12.4.4 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

12.5 Deploying .NET services with Disnix . . . . . . . . . . . . . . . 229

12.5.1 Porting the StaffTracker example . . . . . . . . . . . . . . 229

12.6 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

12.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

VI Conclusion 235

13 Conclusion 237

13.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . 237

13.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

xii

13.2.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

13.2.2 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . 240

13.2.3 Genericity . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

13.2.4 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . 241

13.2.5 Efﬁciency . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

13.3 Research questions revisited . . . . . . . . . . . . . . . . . . . . . 241

13.4 Recommendations for future work . . . . . . . . . . . . . . . . . 244

13.4.1 Network topologies . . . . . . . . . . . . . . . . . . . . . 244

13.4.2 Infrastructure-level self-adaptability . . . . . . . . . . . . 245

13.4.3 More declarative system integration testing of service-

oriented systems . . . . . . . . . . . . . . . . . . . . . . .

245

13.4.4 Test scalability . . . . . . . . . . . . . . . . . . . . . . . . 246

13.4.5 Patter ns for fully distributed atomic upgrades . . . . . . 247

13.4.6 Implementing a license calculus system . . . . . . . . . . 247

13.4.7 More sophisticated state deployment . . . . . . . . . . . 247

13.4.8 Extracting deployment speciﬁcations from codebases . . 247

13.4.9 Case studies with service composability . . . . . . . . . 248

13.4.10 Software deployment research in general . . . . . . . . . 248

Bibliography 251

Samenvatting 265

Curriculum Vitae 271

Titles in the IPA Dissertation Series 273

Contents xiii

xiv

Part I

Introduction

Software has become increasingly important in our society. Computer pro-

grams are virtually everywhere and used to control critical functions, such as

the stock market, aircrafts, pacemakers, and other medical devices. Through-

out the years the way we use software has changed dramatically. Originally,

computer programs were designed for large expensive dedicated machines

with very limited access. Nowadays, computer programs are available as ser-

vices through the Internet and accessible by nearly anyone from almost any

place.

Furthermore, software systems have become bigger and more complicated,

while it is desired to develop them in short development cycles and to have

them work correctly. Because of these reasons, the software engineering disci-

pline has become an important academic research discipline. Software engi-

neering has been deﬁned as:

Software Engineering (SE) is the application of a systematic, disciplined, quan-

tiﬁable approach to the development, operation, and maintenance of software,

and the study of these approaches; that is, the application of engineering to

software [The Joint Task Force on Computing Curricula, 2004].

The software engineering process consists of many activities, such as spec-

ifying requirements, creating a design, writing code, and testing. One of the

activities that is typically overlooked within the research community and un-

derestimated by practitioners is the software deployment process. A deﬁnition

of software deployment is:

Software deployment refers to all the activities that make a software system

available for use [Carzaniga et al., 1998].

Examples of activities that belong to the software deployment process are

building software from source code, packaging software, installing software

packages, activation of software components, and upgrading. For various rea-

sons, deployment processes turn out to be very complicated and error prone.

For the latest generation of software, which is offered as services through the

Internet, many new challenges have arisen in addition to software deployment

challenges that have arisen in the past.

This thesis is about dealing with software deployment complexities for the

latter generation of software systems. We deﬁne a reference architecture that

contains tools solving various deployment tasks, which can be integrated with

domain-speciﬁc components that solve deployment aspects for a speciﬁc do-

main. From this reference architecture, a concrete architecture for a domain-

speciﬁc deployment system can be derived offering fully automatic, reliable,

Figure 1.1 The Electrologica X1

reproducible, and efﬁcient deployment. Although the idea of a deployment

architecture is not completely new, our reference architecture builds on top

of the purely functional deployment model [Dolstra, 2006], which has several

unique advantages compared to conventional deployment solutions.

1.1 S O F T WA R E D E P L O Y M E N T C O M P L E X I T Y

As mentioned previously, software deployment processes of software systems

have become very complicated, error prone, and tedious. Compared to many

other subﬁelds within the software engineering domain (e.g. programming

languages), software deployment has not been a research subject within the

software engineering community until 1997 [van der Hoek et al., 1997].

1.1.1 Early history

When the ﬁrst digital computers appeared in the 1940s and 1950s, software

deployment was a problem that did not exist. Most computer programs were

written directly in machine code for a given machine architecture. Sometimes,

a given machine architecture did not exist yet, as it still had to be constructed,

and only existed on paper.

In these days, writing programs was a ver y complex, challenging and te-

dious job. Apart from decomposing a problem in smaller sub problems that

could be programmed, a programmer also had to deal with many side issues,

such as managing hardware resources and properly accessing hardware pe-

ripherals, such as a printer. A famous example of developing programs in

such a style is described in Edsger Dijkstra’s PhD thesis [Dijkstra, 1959], in

Figure 1.2 Commodore 64 combo showing a BASIC prompt

which a real-time interrupt handler was developed for the Electrologica X1

(shown in Figure 1.1), one of the ﬁrst computers built in the Netherlands.

Because programs were written directly in machine code for a given archi-

tecture, programs were not portable and were mostly discarded and rewritten

when newer computer models appeared. For these reasons, a program was

deployed only once.

1.1.2 Operating systems and high level languages

Manually managing system resources and peripherals was a very tedious

job. For that reason, the concept of the operating system was developed in the

late 1950s, providing a set of programs that manages computer hardware re-

sources and provides common services for application software [Tanenbaum

and Woodhull, 1997]. As a result, programmers no longer had to care very

much about the inner workings of hardware and programmers could focus

themselves more on solving actual problems.

Furthermore, writing machine code was also very tedious, error prone,

and difﬁcult to comprehend by humans. Therefore, higher level languages

were developed. FORTRAN was among the ﬁrst high level programming lan-

guages. Also, operating systems were eventually developed using higher level

languages. UNIX is a prominent example implemented in the C programming

language, which is still widely used nowadays.

The Commodore 64 (shown in Figure 1.2), which gave me my ﬁrst pro-

gramming experience, is a good example of having an operating system pro-

viding an integrated high-level language interpreter. Apart from an operating

system and BASIC programming language interpreter stored in the ROM, a

complete program including its dependencies were stored on a disk or tape.

Because of these developments, it became easier to develop applications as

operating systems hide complexity and high level languages increased pro-

ductivity. As a consequence, the process of getting applications to work

Chapter 1. Introduction 5

Figure 1.3 The NixOS Linux distribution running the KDE Plasma desktop

(i.e. deployment) became harder. For example, in order to be able to use

a computer program implemented in a higher level language, we need the

right version of the compiler or interpreter with a compatible operating sys-

tem. If either of these components is not present or incompatible, the program

may not work as expected or may not work at all. Fortunately, the versions of

operating systems or programming language compilers did not change that

frequently to make software deployment a real burden.

1.1.3 Component-based systems

The early 1980s were the era in which the graphical desktop environment

slowly gained acceptance. Also the component-based software engineering

(CBSE) discipline was introduced. Software programs were no longer self-

contained units, but started reusing components (such as libraries) developed

by third-parties, of which some of them already resided on the target ma-

chines [Szyperski et al., 2002].

CBSE greatly improved programmer productivity. For example, to develop

a program for a desktop environment, developers no longer had to implement

GUI widgets over and over again. Furthermore, the ability to share software

components also reduced the amount of required disk space and the amount

of RAM that programs require.

Although CBSE provides a number of great advantages, it also introduced

additional complexity and challenges to software deployment processes. In

order to be able to run a software program, all dependencies must be present

and correct and the program must be able to ﬁnd them.

It turns out that there are all kinds of things that can go wrong while

deploying a componentised system. A dependency may be missing, or a pro-

gram requires a newer version of a speciﬁc component. Newer components

may be incompatible with a program. Sometimes incompatibility is inten-

tional, but it also happens accidentally due to a bug on which a program may

rely. For example, in the Microsoft Windows community this phenomenon

is known as the “DLL-hell” [Pfeiffer, 1998]. Similar to Windows DLLs, this

phenomenon occurs in many different contexts as well, such as the “JAR-hell”

for Java programs.

In UNIX-like systems such as Linux, the degree of sharing of components

through libraries is raised to almost a maximum. For these kind of systems,

it is crucial to have deployment tooling to properly manage the packages

installed on a system. In Linux distributions, the package manager is a key

aspect and a distinct feature that sets a particular Linux distribution apart

from another. There are many package mangers around such as RPM [Foster-

Johnson, 2003], dpkg, portage, pacman, and Nix (which we extensively use in

our research).

Figure 1.3 shows a screenshot of the KDE Plasma desktop

as part of the

NixOS Linux distribution. At the time of writing this thesis, the base package

of KDE 4.8 without any add-ons is composed of 173 software packages

Apart from the challenges of deploying a system from scratch, many prob-

lems arise when systems are upgraded. Upgrading is necessary because it is

too costly and time consuming to redeploy them from scratch over and over

again. In most cases, upgrading is a risky process, because ﬁles get modi-

ﬁed and overwritten. An interruption or crash may have disastrous results.

Furthermore, an upgrade may not always give the same result as a fresh in-

stallation of a system.

In addition to technical challenges, there are also several important non-

functional challenges while deploying systems using off-the-shelf components.

An important non-functional requirement may be the licenses under which

third party components are released. Nowadays, many Free and Open-Source

components [Free Software Foundation, 2012b; Open Source Initiative, 2012]

are incorporated into commercial products

. Although most of these compo-

nents can be downloaded for free from the Internet, they are not in the public

domain. They are in fact copyrighted and distributed under licenses ranging

from simple permissive ones (allowing one to do almost anything including

keeping modiﬁcations secret) to more complicated “copyleft” licenses [Free

Software Foundation, 2012a] imposing requirements on derived works. Not

obeying these licenses could result in costly lawsuits by copyright holders.

Therefore, it is important to know exactly how a system is composed from

http://www.kde.org

This number is determined by running the following command-line instruction: nix-store

-qR /var/run/current-system/sw/bin/startkde | wc -l

The meaning of the Free and Open-Source terms are often misunderstood. They do not refer

to software which is available for free (a.k.a. gratis) or software for which only the source code is

available. Both are about software, which may be freely used for any goal, distributed, studied,

modiﬁed and even sold for any price. I wrote a blog post about this for more clariﬁcation. [van der

Burg, 2011f]

Chapter 1. Introduction 7

Figure 1.4 A collection of web applications and services

what components and under what licenses these components are distributed,

which is not trivial for deployment processes performed in an ad-hoc manner.

1.1.4 Service-oriented systems and cloud computing

The rise of the Internet and the World-Wide-Web (WWW) in the mid 1990s

caused a paradigm shift in the way some applications were offered to end-

users. Applications no longer had to be installed on the machines of end

users, but were accessed by using a web browser, also called a thin client. The

actual computer system processing user tasks was no longer located on the

client machine, but on a server hosting the web application.

Furthermore, the Internet protocols and ﬁle formats changed the way ap-

plications were developed and constructed. The service-oriented computing

(SOC) paradigm became a popular way to rapidly develop, low-cost, interop-

erable, evolvable, and massively distributed applications [Papazoglou et al.,

2007].

The SOC-vision is often realised through implementing a Service Oriented

Architecture (SOA) in which a system is decoupled into “services”, which

are autonomous, platform-independent entities that can be described, pub-

lished, discovered, and combined which can perform functions ranging from

answering simple requests to sophisticated business processes.

Cloud computing is a relatively new term coined for a new generation of

software systems where applications are divided into sets of composite ser-

vices hosted on leased, highly distributed platforms [Grundy et al., 2012].

Cloud computing providers offer services, that can be roughly divided in

three categories:

• Infrastructure-as-a-Service (IaaS) is a service model providing computer

hardware resources, typically in the form of virtual machines. Cus-

tomers get charged based on the amount of system resources they are

using. IaaS typically provides simpler and more ﬂexible hardware man-

agement.

• Platform-as-a-Service (PaaS) is (in most cases) a solution stack providing

an application server, which includes an operating system, execution en-

vironment, database, web server and other relevant infrastructure com-

ponents. Application developers can develop and run their software ap-

plications on a PaaS platform without the cost and complexity of buying

and managing the underlying hardware and infrastructure components.

• Software-as-a-Service (SaaS) provides application software to end-users,

which can be typically accessed by using thin clients. For these kinds

of services, all resources including the deployment of the service compo-

nents and underlying infrastructure are managed by the service provider.

SaaS applications may also utilise PaaS or IaaS services to provide their

hardware resources and infrastructure components.

Although there is no clear consensus what cloud computing exactly means,

in this thesis we regard it as a continuation of the service-oriented paradigm,

in which services are offered through the Internet, including all their required

resources such as hardware, software and their conﬁguration management

issues.

The latest generation of software systems typically offered as a service on

the Internet, offer a number of additional beneﬁts compared to traditional

desktop applications, described in the previous subsection. End users are

no longer bothered by deployment issues and they always have access to the

latest versions of their applications including their data, which are accessible

through their web browser from virtually any place.

Figure 1.4 shows a collection of prominent web applications and services,

such as Twitter

, Amazon

and Facebook

. Although these applications seem

relatively simple software systems, they are in fact very complicated and het-

erogeneous, as they are composed of many distributable software components

http://www.twitter.com

http://www.amazon.com

http://www.facebook.com

Chapter 1. Introduction 9

developed by various parties, implemented using various programming lan-

guages and various types of components, deployed on several kinds of ma-

chines (having different characteristics) and on various places in the world.

All these services are cooperating together to achieve a common goal.

The web seems to have hidden most deployment issues to end-users, but

this does not mean that the software deployment problem is solved or not im-

portant anymore. Although the software deployment problems have moved

from end-user machines to data-centers, all the new possibilities of service-

orientation introduce even more challenges to the deployment process of soft-

ware systems, next to a number of existing complexities that have arisen in

the past. Since software components are distributed across various machines,

more machines need to be installed, conﬁgured and upgraded. Distributable

components have to connect to other services and may be dependent on each

other. Breaking such dependencies may render a software system partially or

completely unavailable.

Furthermore, there are all kinds of non-functional deployment properties

that must be met, such as performance and privacy concerns. For example,

if we deploy a component to a server with insufﬁcient system resources, the

entire service may not attract any users. It is also undesirable to deploy a

privacy-sensitive dataset in a zone with public access.

1.2 H O S P I TA L E N V I R O N M E N T S

The research done in this thesis was carried out in a NWO/Jacquard pro-

ject abbreviated PDS (Pull Deployment of Services), which is a collaboration

project between Delft University of Technology and Philips Healthcare. The

research goal of this thesis is inspired by the medical domain. As with many

other domains, software systems in the medical domain are becoming service-

oriented. Apart from the software deployment challenges described in the

previous section, hospital environments have a number of distinct character-

istics and deployment challenges that require extra care.

1.2.1 Background

Hospitals are complex organisations, requiring the coordination of specialists

and support staff operating complex medical equipment, involving large data

sets, to take care of the health of large numbers of patients. The use of in-

formation technology for diagnosis and for storage and access of patient data

is of increasing importance. Hospitals are evolving into integrated informa-

tion environments, where patient data, ranging from administrative records

to high-density 3D images, should be accessible in real time at any place in

the hospital.

Data is used by people in different roles such as doctors, nurses, analysts,

administrators, and patients. Each user group uses different portions of the

data for different purposes and at different locations, requiring careful ad-

ministration and application of access rights. Data is also accessed by medical

Figure 1.5 Hospital environment complexity in a nutshell

equipment and software, for example, to improve diagnosis by combining in-

formation from multiple sources. A picture symbolising this complexity is

shown in Figure 1.5.

1.2.2 Device-orientation

The infrastructure of typical hospital environments is currently mostly device-

oriented. That is, the components implementing a workﬂow are statically de-

ployed to ﬁxed devices, which leads to overcapacity due to suboptimal usage

of resources. Resources are reserved for particular workﬂows, even if not

used. This leads to inﬂexibility in reacting to events, a multitude of deploy-

ment and maintenance scenarios, and it requires users to go to the device that

supports a particular task. Because of these problems, the medical world is

changing to a service-oriented environment in which the access to services is

decoupled from the physical access to particular devices. That is, users should

be able to access data and perform computations from where they are, instead

of having to go to a particular device for realising a task.

However, the information technology infrastructure of hospitals is hetero-

geneous and consists of thousands of electronic devices, ranging from work-

stations to medical equipment such as MRI scanners. These devices are con-

nected by wired and wireless networks with complex topologies with differ-

ent security and privacy policies applicable to different nodes. These devices

have widely varying capabilities in terms of processing speed, graphical ren-

dering performance, storage space and reliability, and so on.

1.2.3 Service-orientation

To support running applications in such an environment, hospital machines

have to be treated as a dynamic cloud, where components of the applica-

tion are automatically deployed to machines in the cloud with the required

capabilities and connectivity. For instance, when starting a CPU-intensive

application (e.g., a viewer for 3D scans) on a sufﬁciently powerful PC, the

computation component of the application would be deployed automatically

Chapter 1. Introduction 11

to the local machine. On the other hand, if we ran it on a underpowered PDA,

this component would be deployed to a fast server sufﬁciently close enough

to the PDA in the network topology.

This kind of cloud deployment requires two things. First, it is necessary to

design applications in a way that allows them to be distributed across different

nodes dynamically, and to create a model of applications that describe their

components and the dataﬂows between them. These components can then

be mapped onto nodes in the cloud with the appropriate quality-of-service

characteristics.

Second, given a mapping from components to machines in the cloud, it

is necessary to deploy each component to its selected machine. Software de-

ployment in a heterogeneous environment is inherently difﬁcult. Moreover,

maintaining such installations is even more difﬁcult, because of the growing

amalgam of versions and variants of the software in combination with chang-

ing requirements. The practice of software deployment of complex medical

software in hospital environments is based on ad-hoc mechanisms, making

software deployment a semi-automatic process requiring signiﬁcant human

intervention. Thus, it is essential that deployment is automatic and reliable;

the deployment of a component to a node should not interfere with other

applications or other versions of the component running on the node.

1.3 C H A L L E N G E S

What may have become evident after reading the previous sections is that

changing non-functional requirements and various developments in software

engineering have signiﬁcantly improved the quality and availability of soft-

ware systems, reduced the amount of required system resources and increased

the productivity of programmers. But the negative side effect is that software

deployment processes of software systems have become increasingly more

difﬁcult. Especially for the latest generation of software systems offered as

services, a number of important challenges have to be dealt with.

1.3.1 Complexity

Software systems are becoming larger, which often implies that many compo-

nents must be deployed, for which it is required to perform many deployment

tasks, such as building, installing and activating. Apart from the large num-

ber of tasks that must be performed, components may also be implemented in

various programming languages and designed for various types of operating

systems. For large systems, it is practically infeasible to perform deployment

processes manually.

Moreover, we have also seen that most software systems are rarely self-

contained. They have many dependencies which must be present and correct,

both at build-time and run-time. To ensure correctness, it is required that

developers and system administrators must exactly know what the depen-

dencies of a component are, which is not always obvious. As a consequence,

it often happens that dependencies are missed. For these reasons, systems

are sometimes deployed from scratch rather than upgraded, because upgrade

actions may break a system.

Apart from correctness to the deployment process, a number of non-functional

requirements may also be very important, such as performance, privacy and

the licenses under which components are governed. The hospital environ-

ment challenges described in Section 1.2 can be considered a special case of

service-orientation, with a higher diversity of devices with varying capabili-

ties and more kinds of variants of components which can be combined using

various means.

1.3.2 Reliability

Besides the fact that dependencies may be missed during an installation or an

upgrade, most deployment tasks are typically carried out in an imperative and

destructive manner. For example, during an upgrade, ﬁles may be removed,

adapted or overwritten. These kind of operations are difﬁcult to undo, and

an interruption of such a process may leave a system in an inconsistent state.

Moreover, an upgrade does not always yield the same result as a fresh instal-

lation and may give unexpected results.

Another undesirable side-effect of imperative upgrades is that there may

be a large time-window in which the system is partially or completely un-

available to end-users.

In addition to the reliability of deployment steps, systems deployed in a

distributed environment may also crash and render parts of a system unavail-

able. In such cases, we must redeploy a system, which is practically infeasible

to do in a short time window for systems deployed in an imperative and

ad-hoc manner.

1.3.3 Agility

In addition to the fact that software deployment complexity has increased, we

also want to develop and improve software systems faster. Whereas in the old

days, a programmer had to almost completely know the design of a system

in advance (e.g. waterfall methods) before it can be developed, nowadays

agile methods [Beck et al., 2001] are more common, with short development

cycles of about two weeks, in which an initial prototype is developed and

continuously improved to become the ﬁnal product. After each development

cycle, a new version of a software system must be deployed, so that end-users

can try a new version of the application.

Agile software deployment has manifested itself in a term known as Dev-

Ops, an emerging set of principles, methods and practices for communication,

collaboration and integration between software development and IT opera-

tions professionals [Pant, 2009], because traditionally developers and IT oper-

ators performed separate roles, which could be better performed if these were

more integrated.

Chapter 1. Introduction 13

Furthermore, software systems are not only deployed in production envi-

ronments, but also in test environments so that integration tests can be per-

formed. Because deployment is very complicated (especially for distributed

systems), people refrain from doing continuous integration tests for these

kinds of systems and most importantly – if these systems are tested – they

are only tested on single machines and not in a distributed setting. Many

types of bugs and other issues only arise when software systems are tested in

real distributed settings [Rutherford et al., 2008].

Finally, once a system has been successfully tested in a test environment,

it may also be difﬁcult to guarantee that we can reproduce the exact same

conﬁguration in a production environment.

1.3.4 Genericity

Traditionally, software deployment processes were performed semi-

automatically and in an ad-hoc fashion. For large and complex systems, semi-

automated deployment is infeasible and therefore automation is required to

perform these processes faster and to make them less error prone.

Nowadays, there are many tools that can be used to automate software

deployment tasks. However, they have various drawbacks. Some tools are de-

signed for speciﬁc types of components, such as Java EE applications [Akker-

man et al., 2005], and other tools for speciﬁc environments, such as grid com-

puting [Caron et al., 2006]. There are also several general deployment solu-

tions, such as the Software Dock [Hall et al., 1999]. Apart from their limited

scope and applicability, these tools imperatively modify the conﬁguration of

systems, so that it is hard to reproduce a conﬁguration elsewhere and to roll

back changes.

Modern software systems, such as service-oriented systems, may be com-

posed of services implemented in various programming languages, using var-

ious components technologies and deployed on machines in a network run-

ning various types of operating systems. Using existing deployment tools

make it difﬁcult to support heterogeneous software systems and to ensure

important quality attributes, such as reliability.

1.4 P R O B L E M S TAT E M E N T

Software system are developed with certain goals, features and behaviour in

mind by developers. Once a software system is to be used by end-users, it has

to be made available for use in the consumer environment or – for the latest

generation of systems – in a cloud environment hosted in a data-center. It is

important that the software system behaves exactly the way the developers

intended.

The software deployment vision that we explore in this thesis is to com-

pletely automate deployment processes of the latest generation of software

systems and to perform such processes in a reliable, reproducible and efﬁcient

manner. The intention is that software systems can be deployed frequently

and on demand. For example, we can reproduce deployment scenarios in

various environments, either physical hardware, cloud environments or vir-

tual machines for end-users or for testing. We have seen that this vision is

hard to achieve due to the complexities that newer developments have intro-

duced.

Nix [Dolstra et al., 2004; Dolstra, 2006], which has a purely functional de-

ployment model, is a deployment system developed by Eelco Dolstra as part

of his PhD research, which intends to realise the vision of automated, com-

plete, reliable and reproducible deployment. The Nix package manager is a

model-driven deployment tool, which builds packages from Nix expressions.

Furthermore, it borrows concepts from purely functional programming lan-

guages [Hudak, 1989], such as Haskell, to make deployment more reliable.

Most notably, Nix is used as a basis for NixOS [Dolstra and Löh, 2008;

Dolstra et al., 2010], a GNU/Linux distribution built around the Nix pack-

age manager. Just as Nix realises packages from declarative speciﬁcations,

NixOS a Linux distribution which derives complete system conﬁgurations

from declarative speciﬁcations.

Nix and NixOS are solutions for several deployment aspects in our deploy-

ment vision. However, they are designed for deployment of single systems.

Furthermore, they are general solutions not taking non-functional aspects of

a domain into account, which cannot be solved in a generic manner.

In order to support the latest generations of systems offered as services,

we have to support a number of distributed deployment aspects as well as

a number of non-functional deployment properties of a certain domain. As

modern systems are distributed, heterogeneous and have other requirements

that cannot be solved in a generic manner, it is impossible to develop a sin-

gle deployment tool that is suitable for all domains. Instead, we intend to

realise our deployment vision by means of a reference architecture [Bass et al.,

2003; Taylor et al., 2010] describing a family of domain-speciﬁc deployment

tools, containing components implementing various deployment concepts in

which domain-speciﬁc extensions can be integrated. A reference architecture

is deﬁned by Taylor et al. [2010] as:

The set of principal design decisions that are simultaneously applicable to mul-

tiple related systems, typically within an application domain, with explicitly

deﬁned points of variation.

Our reference architecture can be used to realise a concrete architecture

for a domain-speciﬁc deployment tool, capable of automating deployment

processes in a reliable, reproducible, and efﬁcient manner taking domain-

speciﬁc deployment aspects into account.

Although the idea of using software architecture concepts to automate de-

ployment processes is not entirely new (for example it has been done in the

IBM Alpine project [Eilam et al., 2006]) the major difference of our approach

is that we use the purely functional deployment model as a basis. This offers

a number of advantages, but also a number of challenges, mainly due to a

number of important non-functional requirements that are desired to met.

Chapter 1. Introduction 15

1.5 R E S E A R C H Q U E S T I O N S

The goal of this thesis is to design a reference architecture for distributed

software deployment. To implement the components of this reference archi-

tecture, we have to provide answers for the following research questions:

Research Question 1

How can we automatically deploy heterogeneous service-oriented systems in a

network of machines in a reliable, reproducible and efﬁcient manner?

The Nix package manager has been developed as a generic package man-

agement tool for deploying packages on single systems. Service-oriented sys-

tems are composed of distributable components (also called services) using

various technologies, which may be interconnected and can be deployed in

networks of machines having different operating systems, architectures and

underlying infrastructure components. In order to automate the deployment

of these kind of systems, additional speciﬁcations and tools are required, that

properly utilise the purely functional deployment concepts of Nix.

Apart from the ser vice components that constitute a service-oriented sys-

tems, we also have to deploy the underlying infrastructure components, such

as application servers and a DBMS as well as complete operating systems

supporting essential system services. Furthermore, system conﬁgurations can

also be deployed in various environments, such as physical machines, virtual

machines hosted in a cloud environment or hybrid variants. Furthermore, in

order to be ﬂexible, it is desirable to be able to reproduce conﬁgurations in

various environments, such as moving network of machines from one IaaS

provider to another.

Research Question 2

How can we efﬁciently deal with deployment related events occurring in a net-

work of machines?

After a service-oriented system has been deployed in a network of ma-

chines, it is not guaranteed that it will stay operational, as many events may

occur. For example, a machine may crash, a network link may break, a new

machine offering additional system resources can be added to a network or

a machine may be upgraded. In such cases, redeployment is needed to keep

the overall system operational and to meet all the required non-functional

deployment properties.

Furthermore, manually redeploying distributed systems in such cases can

be ver y labourious and error prone. Therefore, it is desirable to explore

whether we can automatically redeploy in case of such events.

It is also possible to derive many infrastructure properties dynamically, by

using a discovery service which reveals machine characteristics and by using

deployment planning algorithms, which dynamically map services to machines

with the right characteristics. Characteristics can be both technical or non-

functional and speciﬁc to the domain in which the system is to be used. In

order to support these non-functional properties, we need to craft an extended

deployment tool which properly supports these requirements.

Research Question 3

How can we declaratively perform distributed system integration tests?

As mentioned earlier, systems should not only be deployed in production

environments, but also in test environments so that system integration tests

can be performed. A good practice used for non-distributed componentised

systems, is to do continuous integration and testing, i.e. for a change in the

source code repository the package gets build and tested on a build server.

For many GNU/Linux source packages, a test procedure can be executed by

running: make check on the command-line.

For distributed systems, it is desirable to have an equivalent procedure,

but this vision is difﬁcult to achieve, because deployment is very difﬁcult and

expensive. As a consequence, people typically refrain from testing systems in

distributed environments, which is undesirable. Certain errors do not mani-

fest themselves on single systems. For all these reasons, it is desirable to ﬁnd

a solution for this problem.

Research Question 4

How can we automatically and reliably deploy mutable software components in

a purely functional deployment model?

Nix is known as the purely functional deployment model, because it bor-

rows concepts from purely functional programming languages. As a result,

one if its characteristics is that components which have been built are made

immutable, to make deployment deterministic and reproducible. Unfortu-

nately, not all types of components can be managed in such a model, such as

databases. Apart from an initialisation step on ﬁrst usage, these type of com-

ponents still need to be deployed manually, which can be a burden in a large

network of machines. In order to provide a solution for this complexity, it is

desirable to have a solution that deploys mutable components with semantics

close to Nix.

Research Question 5

How can we determine under which license components are governed and how

they are used in the resulting product?

We have also explained that license governance is a ver y important non-

functional requirement, while deploying a software system. It is undesirable

to deploy a system which violates the licenses under which the third party

components are governed, as this may result in costly lawsuits by copyright

holders. Therefore, we must exactly know what ﬁles are used to produce a

speciﬁc artifact, such as an executable, and we must know how they are used.

Chapter 1. Introduction 17

Chapter

4 5 6 7 8 9 10 11 12

√ √ √

√

√ √

Table 1.1 Overview of the research questions and the covering chapters

Research Question 6

How can we apply the components of the reference architecture?

The goal of the reference architecture for distributed software deployment

is a means to realise a concrete architecture for a domain-speciﬁc deploy-

ment tool. Therefore, it is desirable to explore how we can implement such a

domain-speciﬁc tool and what lessons we can learn from it.

Furthermore, the Nix package manager (also known as the purely func-

tional deployment model) is typically applied in somewhat idealised environ-

ments in which components can be easily isolated. However, in practice, it

may happen that large existing codebases must be deployed using compo-

nents from the reference architecture in closed environments which cannot be

completely isolated. In such environments, adopting components from our

reference architecture is a non-trivial process and sometimes compromises

must be made. Therefore, it is important to explore the application of these

components in impure environments and to derive additional solutions and

compromises.

1.6 A P P R O A C H

Table 1.1 provides an overview of questions that were investigated and their

corresponding chapters. To answer the research questions, we ﬁrst thoroughly

explore the concepts of Nix and NixOS which ser ve as the foundations of the

reference architecture, as their concepts are quite uncommon. From our re-

quirements and these concepts, we derive the structure of our reference archi-

tecture for distributed software deployment including a number of important

quality attributes.

The next step is to investigate how we can extend the concepts of Nix to

service-oriented systems, that is, systems composed of components that can

be distributed across machines in a network. Then we explore another impor-

tant aspect, namely the deployment of machine conﬁgurations in a network

using the same deployment properties, such as reliabilility, reproducibility

and efﬁciency.

After exploring two distributed deployment aspects that must be captured

in our reference architecture, we try to improve and implement a number of

general orthogonal concepts. For reliability, it is desirable to have distributed

atomic upgrades. We must also explore how to manage components that

cannot be managed through the purely functional model. Moreover, it is

important to ﬁnd a solution to the license governance problem.

Then we investigate how we can apply the components of our reference ar-

chitecture to realise a concrete architecture for a domain-speciﬁc deployment

tool as well as how we can apply these components in impure environments.

Finally, we reﬂect on the reference architecture and we investigate how well

we have achieved our desired quality attributes.

1.7 O U T L I N E O F T H I S T H E S I S

Chapter 2 gives background information about Nix, which is also known as

the purely functional deployment model, and two relevant applications built

on top of it; NixOS, an entire Linux distribution utilising Nix and Hydra, a Nix-

based continuous build and integration service. This background information

serves as an important foundation of this thesis and is later used in Chapter 3

to derive a reference architecture for distributed software deployment, which

can be used to create a concrete architecture for a domain-speciﬁc deployment

tool.

The following two chapters cover service deployment. Chapter 4 describes

the ﬁrst component of the reference architecture: Disnix, a tool that can be

used to deploy services in networks of machines. Disnix extends Nix with

additional models and provides management of inter-dependencies, which are

dependencies among services which may reside on different machines in the

network. Chapter 5 describes a self-adaptive deployment extension to Disnix,

which automatically redeploys service-oriented systems in case of an event.

The next two chapters cover infrastructure deployment aspects. Chapter 6

describes DisnixOS an extension to Disnix which uses Charon, a NixOS-based

deployment tool that manages complete system conﬁgurations in a network

of machines, to provide complementary infrastructure deployment. Chapter 7

demonstrates how the network models of the previous chapter can be used for

creating efﬁcient networks of virtual machines and we show how to perform

distributed system integration tests in these virtual networks.

The following two chapters describe several important general ingredients

of the reference architecture. Chapter 8 describes how the distributed atomic

upgrading property is achieved in our deployment tools.

Chapter 9 describes Dysnomia, a proof-of-concept tool, for deploying mu-

table software components. Currently, Nix only support automated deploy-

ment component of immutable components, which never change after they

have been built. Databases, which are mutable, cannot be automatically de-

ployed using Nix. Dysnomia provides a solution for this problem and can be

integrated in Disnix.

Chapter 1. Introduction 19

In Chapter 10, we provide a solution for the licensing problem of software

components, by dynamically tracing build processes performed by Nix. These

traces can be used to produce graphs and can reveal what ﬁles are used during

a build and how these ﬁles are combined into the resulting artifacts, such as

an executable. This information can be used to determine whether a particular

deployment scenario does not violate a speciﬁc license policy.

The last chapters describe a number of applications using tools part of the

reference architecture. Chapter 11 demonstrates how the reference architec-

ture can be used to provide a deployment solution for a particular applica-

tion domain. We have developed an application speciﬁc deployment tool for

WebDSL [Visser, 2008], a domain-speciﬁc language for developing web appli-

cations with a rich data model. Chapter 12 describes experiences from our

collaboration with Philips Healthcare, in which we show how to apply pieces

of the reference architecture in impure environments.

1.8 O R I G I N O F C H A P T E R S

Parts of this thesis originate from a number of peer-reviewed publications:

• The SCP 2012 paper: “Disnix: A toolset for distributed deployment”

[van der Burg and Dolstra, 2012], as well as the earlier WASDeTT 2010

version of this paper [van der Burg and Dolstra, 2010d], covering devel-

opment aspects and experimental applications, serve as the "backbone"

of this thesis. Their sections have been integrated in many chapters of

this thesis.

• Section 1.2 of this introduction is based on Sections 1-3 of the ICSE

Cloud 2009 paper: “Software Deployment in a Dynamic Cloud: From Device

to Service Orientation in a Hospital Environment” [van der Burg et al., 2009].

• Chapter 3 includes several small snippets taken from Section 6 of the

SCP 2012 paper.

• Chapter 4 is based on the SEAA 2010 paper: “Automated Deployment

of a Heterogeneous Service-Oriented System” [van der Burg and Dolstra,

2010a] combined with Sections 1-6 of SCP 2012 paper, augmented with

additional details and clariﬁcations.

• Chapter 5 is an extension of the SEAMS 2011 paper: “A Self-Adaptive

Deployment Framework for Service-Oriented Systems” [van der Burg and

Dolstra, 2011].

• The DisnixOS section of Chapter 6 is based on Section 7 of the SCP

paper.

• Chapter 7 is an extension of the ISSRE 2010 paper: “Automating Sys-

tem Tests Using Declarative Virtual Machines” [van der Burg and Dolstra,

2010b].

• Chapter 8 is loosely based on the HotSWUp 2008 paper: “Atomic Up-

grading of Distributed Systems” [van der Burg et al., 2008], updated to

reﬂect the current implementation and extended to cover both service

and infrastructure deployment.

• Chapter 9 is an adaption of the HotSWUp 2012 paper: “A Generic Ap-

proach for Deploying and Upgrading Mutable Software Components” [van der

Burg, 2012].

• Chapter 10 is an adaption of the Technical Report: “Discovering Soft-

ware License Constraints: Identifying a Binary’s Sources by Tracing Build Pro-

cesses” [van der Burg et al., 2012].

• Chapter 11 contains snippets taken from Section 6 and 7 from the WAS-

DeTT 2010 paper.

The following technical report has not been directly used as a basis for

any chapter, but has contributed signiﬁcantly to this thesis. In this paper, we

have integrated the deployment and testing disciplines for distributed sys-

tems, which is a vision we intend to realise in this thesis:

• The technical report: “Declarative Testing and Deployment of Distributed

Systems” [van der Burg and Dolstra, 2010c].

The testing aspects of this paper have been continued in the ISSRE paper.

Chapter 1. Introduction 21

Background: Purely Functional Software

Deployment

A B S T R A C T

The goal of this thesis is to design a reference architecture for distributed

software deployment, utilising the Nix package manager which implements the

purely functional deployment model. Nix offers several unique advantages

over conventional deployment approaches. Because this deployment model is

relatively unknown, this chapter explains the background concepts. The con-

cepts described in this chapter are earlier scientiﬁc contributions and updated

to reﬂect the implementation as it is today.

2.1 B A C K G R O U N D

The Nix package manager has been developed as part of Eelco Dolstra’s PhD

thesis [Dolstra, 2006; Dolstra et al., 2004] and has been designed to overcome

certain limitations of conventional package managers and deployment tools,

such as RPM [Foster-Johnson, 2003], by borrowing concepts from purely func-

tional programming languages, such as Haskell [Hudak, 1989].

A big drawback of conventional deployment tools is that most of them do

not manage packages in isolation, but instead their contents are stored in global

directories, such as: /usr/lib and /usr/include on Linux or in C:\Windows\System32

on Microsoft Windows platforms. The contents of these directories are shared

with other components.

Global directories introduce several problems to the reliability and repro-

ducibility of deployment processes. For example, dependency speciﬁcations

of packages could be incomplete, while the package may still be successfully

built on a particular machine, because a missing dependency can still be im-

plicitly found. Furthermore, it is also possible that the deployment of a pack-

age may imperatively modify or remove ﬁles belonging to another package,

which could break software installations.

Another major drawback is that conventional package managers use nom-

inal dependency speciﬁcations. For example, an RPM package may specify a

dependency by using a line, such as: Requires: openssl >= 1.0.0 to indicate that

a package requires OpenSSL version 1.0.0 or later. A nominal speciﬁcation

has several drawbacks. For example, the presence of an OpenSSL package

conforming to this speciﬁcation, may not always give a working deployment

scenario. An example is that OpenSSL may be compiled with an older version

of GCC having an Application Binary Interface (ABI) which is incompatible

with the given package.

Upgrades of systems are also potentially dangerous and destructive, as these

imperative operations cannot be trivially undone and most importantly, when

upgrading, a system is temporarily inconsistent because packages have ﬁles

belonging to both the old version and new version at the same time. It may

also very well happen that an upgrade of an older component does not give

the same result as a fresh installation.

2.2 T H E N I X PA C K A G E M A N A G E R

The Nix package manager has been designed to overcome the limitations de-

scribed earlier. In Nix, every package is stored in isolation by taking all build

dependencies into account. Furthermore, dependencies are statically bound

to components so that, at runtime, different versions of dependencies do not

interfere with each other. Nix addresses dependencies using an exact depen-

dency speciﬁcation mechanism, which uniquely identiﬁes a particular version

or variant of a component. Furthermore, the unorthodox concepts of Nix pro-

vide several other useful features, such as a purely functional domain-speciﬁc

language (DSL) to declaratively build packages, the ability to perform atomic

upgrades and rollbacks in constant time, transparent source/binary deploy-

ment and a garbage collector capable of safely removing packages which are

no longer in use.

2.2.1 The Nix store

In order to achieve the vision of reliable, pure and reproducible deployment,

Nix stores components in a so called Nix store, which is a special directory in

the ﬁle system, usually /nix/store. Each directory in the Nix store is a compo-

nent stored in isolation, since there are no ﬁles that share the same name. An

example of a component name in the Nix store is: /nix/store/1dp59cdv...-hello-2.7.

A notable feature of the Nix store are the component names. The ﬁrst part

of the ﬁle name, e.g. 1dp59cdv... is a SHA256 cryptographic hash [Schneider,

1996] in base-32 notation of all inputs involved in building the component.

The hash is computed over all inputs, including:

• Sources of the components

• The script that performed the build

• Any command-line arguments or environment variables passed to the

build script

• All build-time dependencies, such as the compiler, linker, libraries and

standard UNIX command-line tools, such as cat, cp and tar

The cryptographic hashes in the store paths serve two main goals:

• Preventing interference. The hash is computed over all inputs involving

the build process of the component. Any change, even if it is just a

{stdenv, fetchurl, perl}: 1

stdenv.mkDerivation { 2

name = "hello-2.7"; 3

src = fetchurl { 4

url = mirror://gnu/hello/hello-2.7.tar.gz;

sha256 = "1h17p5lgg47lbr2cnp4qqkr0q0f0rpffyzmvs7bvc6vdrxdknngx";

};

buildInputs = [ perl ]; 5

meta = { 6

description = "GNU Hello, a classic computer science tool";

homepage = http://www.gnu.org/software/hello/;

};

}

Figure 2.1 applications/misc/hello/default.nix: A Nix expression for GNU Hello

white space in the build script, will be reﬂected in the hash. For all de-

pendencies of the component, the hash is computed recursively. If two

component compositions are different, they will have different paths in

the Nix store. Because hash codes reﬂect all inputs to build the com-

ponent the installation or uninstallation step of a component will not

interfere with other conﬁgurations.

• Identifying dependencies. The hash in the component ﬁle names also pre-

vents the use of undeclared dependencies. Instead of searching for li-

braries and other dependencies in global name spaces such as /usr/lib,

we have to give paths to the components in the Nix store explicitly. For

instance, the following compiler instruction:

$ gcc -o test test.c lib.c -lglib

will fail because we have to specify the explicit path to the glib library. If

we pass the -L/nix/store/3a40ma...-glib-2.28.8/lib argument to the compiler the

command will succeed because, it is able to ﬁnd a speciﬁc glib library

due to the path speciﬁcation that we have provided.

2.2.2 Specifying build expressions

In order to build packages, Nix provides a purely functional domain speciﬁc

language, to declaratively specify build actions. Figure 2.1 shows a Nix ex-

pression, which describes how to build GNU Hello

from source code and its

build-time dependencies:

1 Essentially, the expression is a function deﬁnition, taking three argu-

ments. The function arguments represent the build-time dependencies

needed to build the component. The stdenv argument is a component

http://www.gnu.org/software/hello

Chapter 2. Background: Purely Functional Software Deployment 25

providing a standard UNIX utility environment offering tools such as

cat and ls. Furthermore, this component also provides a GNU build

toolchain, containing tools such as gcc, ld and GNU Make. The fetchurl

argument provides a function, which downloads source archives from

an external location. The perl argument is a package containing the Perl

language interpreter and platform

The remainder of the expression is a function invocation to stdenv.mkDerivation,

which can be used to declaratively build a component from source code

and its given dependencies.

3 This stdenv.mkDerivation function takes a number of arguments. The name

argument provides a canonical name that ends up in the Nix store path

of the build result.

4 The src parameter is used to refer to the source code of the package. In

our case, it is bound to a fetchurl function invocation, which downloads

the source tarball from a web server. The SHA256 hash code is used to

check whether the correct tarball has been downloaded.

5 The buildInputs parameter is used to add other build-time dependencies

to the environment of the builder, such as perl.

6 Every package may also contain meta attributes, such as a description

and a homepage from which the package can be obtained. These at-

tributes are not used during a build.

In Figure 2.1, we have not speciﬁed what steps we should perform to build

a component, which can be done – for example – by specifying the buildCom-

mand argument or specifying build phases. If no build instructions are given,

Nix assumes that the package can be built by invoking the standard GNU

Autotools [MacKenzie, 2006] build procedure, i.e. ./conﬁgure; make; make install.

In principle, any build tool can be invoked in a Nix expression, as long as it

is pure. In our Nix packages repository, we have many types of packages and

build tools, such as Apache Ant

, Cabal

, SCons

, and CMake

Apart from compiling software from source code, it is also possible to de-

ploy binary software by copying prebuilt binaries directly into the output

folder. For some component types, such as ELF executables – the binary for-

mat for executables and libraries used on Linux and several other UNIX-like

systems – patching is required, because normally these executables look into

default locations, such as /usr/lib, to ﬁnd their run-time dependencies, which

do not work correctly because their dependencies cannot be found. For this

http://www.perl.org

http://ant.apache.org

http://www.haskell.org/cabal

http://www.scons.org

http://www.cmake.org

rec { 7

hello = import ../applications/misc/hello { 8

inherit fetchurl stdenv perl; 9

};

perl = import ../development/interpreters/perl {

inherit fetchurl stdenv;

};

fetchurl = import ../build-support/fetchurl {

inherit stdenv; ...

};

stdenv = ...;

}

Figure 2.2 all-packages.nix: A partial composition expression

purpose, a tool called PatchELF

has been developed, which makes it pos-

sible to change the RPATH ﬁeld of an ELF executable, so that the run-time

dependencies of prebuilt binaries can be found.

2.2.3 Composing packages

As the Nix expression in Figure 2.1 deﬁnes a function, we cannot use it to

directly build a component. We need to compose the component, by calling

the function with the required function arguments.

Figure 2.2 shows a partial Nix expression, which provides compositions by

calling build functions with their required arguments:

7 Essentially, this expression is an attribute set of name and value pairs,

i.e. rec { name

= value

; ...; name

= value

; }, in which attributes are allowed

to refer to each other, hence the rec keyword. Each attribute deﬁnes a

package, while its value refers to a build function invocation.

8 The import statement is used to import an external Nix expression. For

example, the hello attribute refers to the Nix expression declaring a build

function, which we have shown earlier in Figure 2.1.

9 The arguments of this function invocation refer to the other attributes

deﬁned in the same composition expression, meaning that they are be-

ing built ﬁrst, before the GNU Hello world package gets build. The

inherit perl; argument is syntactic sugar for perl = perl;, which inherits an at-

tribute from its lexical scope. By providing a different argument, e.g. perl

= myOtherPerl; a different composition can be achieved. Because in many

cases the function arguments are identical to the attribute names in the

composition expression, Nix provides a callPackage statement, which au-

tomatically inherits all function arguments, apart from a given number

of speciﬁed exceptions.

http://nixos.org/patchelf

Chapter 2. Background: Purely Functional Software Deployment 27

In principle, all packages including all their dependencies, must be deﬁned

as Nix expressions in order to reliably deploy it with the Nix package man-

ager, although there are ways to refer to ﬁles stored outside the Nix store,

which is not recommended. Because everything must be speciﬁed, writing

deployment speciﬁcations could be a burden. Fortunately, there is a reposi-

tory available, called Nixpkgs

, which contains about 2500 packages containing

many-well known free and open source components as well as some propri-

etary ones, such as build tools, utilities and end-user applications. The expres-

sions for these components in Nixpkgs can be reused to reduce the burden of

specifying everything.

On Linux platforms, Nixpkgs is fully bootstrapped, which means that there

are no references to components outside the Nix store. On other platforms,

such as FreeBSD, Mac OS X and Cygwin, some essential system libraries are

referenced on the host system.

Recently, we have also implemented experimental support for .NET de-

ployment on Microsoft Windows platforms. For deployment of these compo-

nent types, it is impossible to isolate certain components, such as the .NET

framework. There are some tricks to make it as pure as possible and to con-

veniently provide deployment support through Nix, although these solutions

are non-trivial. Chapter 12 contains more information about this.

2.2.4 Building packages

By using the composition expression, shown in Figure 2.2, the GNU Hello

package, including all its required build-time dependencies, can be built by

running the following command-line instruction:

$ nix-build all-packages.nix -A hello

Figure 2.3 shows all build-time dependencies of the GNU Hello package,

built on Linux. As may be obser ved, the graph is quite large and complicated.

Currently, the GNU Hello package has 163 build-time dependencies.

While performing a build of a Nix package, Nix tries to remove many

side effects from a build process. For example, all environment variables

are cleared or changed to non-existing values, such as the PATH environment

variable

. We have also patched various essential build tools, such as gcc, so

that they ignore impure global directories, such as /usr/include and /usr/lib. Op-

tionally, builds can be performed in a dedicated chroot() environment, which

restricts access to the root of the ﬁlesystem of the builder process to a sub di-

rectory, in which only the speciﬁed dependencies (and a number of required

ﬁles to run a build) are visible and anything else is invisible, giving even

better guarantees of purity.

Because Nix removes as many side effects as possible and due to the fact

that every component is isolated in the Nix store, dependencies can only can

be found by explicitly setting search directories, such as PATH, to contain the

http://nixos.org/nixpkgs

PATH is not cleared but set to /path-not-set to prevent the shell ﬁlling it with default values

Figure 2.3 Build-time graph of GNU Hello on Linux, containing all build-time de-

pendencies of GNU Hello, including the compiler’s compiler, glibc and the Linux

kernel headers.

Chapter 2. Background: Purely Functional Software Deployment 29

Nix store paths of the components that we want to use. Undeclared depen-

dencies, which may reside in global directories such as /usr/lib, cannot acciden-

tally let a build succeed.

Another advantage is that it is safe to assume that a component with the

same hash code built on another machine is equivalent, assuming that the

remote machine can be trusted

. By taking this knowledge into account, Nix

also supports binary deployment; if a component with the same hash code

exists elsewhere, we can download a substitute from an external location rather

than building it ourselves, signiﬁcantly reducing the amount of buildtime.

Furthermore, to reduce the download time of substitutes and the overall

deployment time, Nix also has the option to generate binary patches, for ver-

sions of components that are similar to each other [Dolstra, 2005a]. Binary

patches can be applied to an older version of a component and the result of

the patching process can be stored in a new component in the Nix store, hav-

ing the hash of the newer component. Although binary patches may speed up

the deployment time for large components, it may actually increase the de-

ployment time of small components, because generating and applying patches

may take more time than downloading a fresh version.

Another implication is that if we want to build a particular package for a

different type of platform (e.g. a FreeBSD package, while our host system

is a Linux system), we can also forward a build to a system of the required

type and copy the build result back. A system architecture identiﬁer is also

reﬂected in the hash code of a Nix store path, and thus does not interfere

with components build for a different architecture. As we will see later in this

thesis, this property also provides several useful features.

2.2.5 Runtime dependencies

Nix is capable of detecting runtime dependencies from a build process. This

is done by conservatively scanning a component for the hash codes of build-

time dependencies that uniquely identify a component in the Nix store. For

instance, libraries that are needed to run a speciﬁc executable are stored in

the RPATH ﬁeld in the header of the ELF executable, which speciﬁes a list of

directories to be searched by the dynamic linker at runtime

. The RPATH

header of the GNU Hello executable in our last example, may contain a path

to the standard C library which is: /nix/store/6cgraa4mxbg1...-glibc-2.13. Apart

from RPATH ﬁelds, Nix can also detect store paths in many other types of

ﬁles (as long as the hashes are in a detectable format), such as text ﬁles, shell

scripts and symlinks.

By scanning the hello component we know that a speciﬁc glibc component

in the Nix store is a runtime dependency of GNU Hello. Figure 2.4 shows

the runtime dependencies of the GNU Hello component in the Nix store.

It has some caveats as described by Dolstra and Löh [2008]

Nixpkgs contains a wrapped ld command, which automatically sets the RPATH of an ELF

binary so that shared libraries are statically bound to an executable.

/nix/store

1dp59cdvjql2...-hello-2.7

bin

hello

man

man1

hello.1

6cgraa4mxbg1...-glibc-2.13

lib

libc.so.6

p7gs4gixwdbm...-subversion-1.7.2

bin

svn

lib

libsvn...so

fwkrhfb0zavr...-openssl-1.0.0g

lib

libssl.so.1.0.0

qvvch57sx4lw...-db-4.5.20

lib

libdb-4.so

Figure 2.4 Runtime dependencies of the GNU Hello component

Although the scanning technique may sound risky, we have used this for

many packages and it turns out to work quite well.

Nix also guarantees complete deployment, which requires that there are no

missing dependencies. Thus we need to deploy closures of components un-

der the “depends on” relation. If we want to deploy component X which

depends on component Y, we also have to deploy component Y before we de-

ploy component X. If component Y has dependencies we also have to deploy

its dependencies ﬁrst and so on.

2.2.6 Nix proﬁles

Storing components in a Nix store in isolation from each other is a nice fea-

ture, but not very user friendly. For instance, if a user wants to start the

program hello, then the full path to the hello executable in the Nix store must

be speciﬁed, e.g. /nix/store/1dp59cdv...-hello-2.7/bin/hello.

To make components more accessible to end users, Nix provides the con-

cept of Nix proﬁles, which are speciﬁc user environments composed of symlink

trees that abstract over store paths.

By adding the /home/user/.nix-proﬁle/bin directory to the PATH environment

variable, which resolves indirectly to the store paths of each component, the

user can start the hello component activated in its proﬁle by just typing: hello

on the command-line. The symlink tree that is dereferenced is illustrated in

Figure 2.5.

Chapter 2. Background: Purely Functional Software Deployment 31

/home/sander/.nix-proﬁle

/nix/var/nix/proﬁles

default

default-23

default-24

/nix/store

a2ac18h4al2c...-user-env

bin

hello

y13cz8h6ﬂiz...-user-env

1dp59cdvjql2...-hello-2.7

bin

hello

man

man1

hello.1

6cgraa4mxbg1...-glibc-2.13

lib

libc.so.6

Figure 2.5 An example Nix proﬁle that contains the GNU Hello component

The set of available components in the environment of the user, can be

modiﬁed using the nix-env command. To add a component to the Nix envi-

ronment – for instance the GNU Hello component – the user can type:

$ nix-env -f all-packages.nix -i hello

By executing the command above, the package will be built and made avail-

able in the Nix proﬁle of the user, because symlinks are made that point to

the ﬁles of the GNU Hello component.

In order to upgrade the hello component the user can type:

$ nix-env -f all-packages.nix -u hello

Uninstalling the hello component can be done by typing:

$ nix-env -f all-packages.nix -e hello

The uninstall operation does not delete the speciﬁc GNU Hello component

from the Nix store. It just removes the symlinks to the GNU Hello package

from the environment. The Hello component could be in use by another Nix

proﬁle of another user or still being a running process, thus it is not safe to

remove them.

If it turns out that the new version of the GNU Hello component does not

meet our requirements or we regret that we have uninstalled the GNU Hello

component, we can rollback to a previous version of the user environment, by

typing:

$ nix-env -rollback

Because multiple variants of packages in the Nix store can safely coex-

ist without interference, ordinary users without root privileges can install

packages without affecting packages belonging to Nix proﬁles of other users

and (for example) inject a trojan horse. It is even possible to allow packages

coming from different sources (including untrusted sources) to safely coexist.

More information on this can be found in [Dolstra, 2005b].

2.2.7 Atomic upgrades and rollbacks

The deployment steps in Nix are atomic. At any point in time, running the

command hello from the command-line either runs the old or the new ver-

sion of GNU Hello, but never an inconsistent mix of the two. The user’s PATH

environment variable contains the path /nix/var/nix/proﬁles/default/bin, which indi-

rectly resolves to a directory containing symlinks to the currently “installed”

programs. Thus, /nix/var/nix/proﬁles/default/bin/hello resolves to the currently ac-

tive version of GNU Hello. By ﬂipping the symlink /nix/var/nix/proﬁles/default

from default-23 to default-24, we can switch to the new conﬁguration. Replacing

a symlink can be done atomically on UNIX, hence upgrading can be done

atomically.

2.2.8 Garbage collection

The Nix garbage collector, which has to be called explicitly, checks which

components have become obsolete and safely deletes them from the Nix store.

Components part of a Nix proﬁle or referenced by another component as well

as running processes and open ﬁles are identiﬁed as garbage collector roots

and identiﬁed as being in use. Thus it is safe to run the garbage collector at

any time.

The garbage collector can be invoked by typing:

$ nix-collect-garbage

It is also possible to remove all older generations of a Nix user proﬁle and

garbage with it, by typing:

$ nix-collect-garbage -d

This command also removes the older default-23 symlink and also removes all

the garbage with it.

2.2.9 Store derivations

Components are not directly built from Nix expressions but from lower-level

store derivation ﬁles. Figure 2.6 shows the two-stage building process of Nix

expressions.

The Nix expression language is not used directly, because it is fairly high

level and subject to evolution. Furthermore, it is difﬁcult to uniquely identify

a Nix expression, as Nix expressions can import other Nix expressions that

are scattered across the ﬁlesystem.

Chapter 2. Background: Purely Functional Software Deployment 33

Figure 2.6 Two-stage building of Nix expressions

Derive(

[("out", "/nix/store/1dp59cdvjql2l3y0j8a492vsf3lyr6ln-hello-2.7", "", "")]

, [ ("/nix/store/dvrn7nnqm4d1hvm2b6hxqcfzqkjfqgsy-hello-2.7.tar.gz.drv", ["out"]) 11

, ("/nix/store/g9ciis17a2dnqxfhv40iis964l08cbl2-bash-4.2-p20.drv", ["out"])

, ("/nix/store/gi7a993hcndzdf09f2i08wqc7d91m1nn-stdenv.drv", ["out"])

]

, ["/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"] 12

, "x86_64-linux" 13

, "/nix/store/xkzklx3jbds21xgry79hk00gszhanjq0-bash-4.2-p20/bin/bash"

, ["-e", "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"]

, [ ("buildInputs", "") 15

, ("buildNativeInputs", "")

, ("builder", "/nix/store/xkzklx3jbds21xgry79hk00gszhanjq0-bash-4.2-p20/bin/bash")

, ("doCheck", "1")

, ("name", "hello-2.7")

, ("out", "/nix/store/1dp59cdvjql2l3y0j8a492vsf3lyr6ln-hello-2.7")

, ("propagatedBuildInputs", "")

, ("propagatedBuildNativeInputs", "")

, ("src", "/nix/store/ycm2pz2bln77gpjxvyrnh6k1hg9c4xik-hello-2.7.tar.gz")

, ("stdenv", "/nix/store/7c8asx3yfrg5dg1gzhzyq2236zfgibnm-stdenv")

, ("system", "x86_64-linux")

]

)

Figure 2.7 Structure of the GNU Hello store derivation

For all these reasons, Nix expressions are not built directly but translated to

store derivation ﬁles encoded in a more primitive language. A store derivation

describes the build action of one single component. Store derivation ﬁles are

also placed in the Nix store and have a store path as well. The advantage of

keeping store derivation ﬁles inside the Nix store gives us a identiﬁcation of

source deployment objects.

Figure 2.7 shows the structure of the Nix store derivation used to build the

GNU Hello component:

10 This line contains a list of attributes containing Nix store output paths

(in which the results of the build are stored) and variable names. In this

example, we only use one output directory, which corresponds to out.

11 In this section, all the build inputs are speciﬁed, which are components

that must be built before the Nix component that produces its output in

out.

12 This line speciﬁes the path to the default builder script

13 This line speciﬁes for which system architecture the component must

be built. If this system identiﬁer does not match the host’s system ar-

chitecture, Nix can optionally forward the build to an external machine,

capable of building it.

14 This line refers to a script, which performs the build procedure. The bash

shell is used to execute the given shell script and executed in an isolated

environment in which environment variables are cleared or set to unini-

tialised values. Optionally, the script is jailed in a chroot() environment

to ensure a high degree of purity.

15 The remaining lines specify all the environment variables that are passed

to the builder script, so that the build script knows certain build at-

tributes, where to ﬁnd dependencies and where to store the build re-

sults.

By invoking the nix-instantiate command, it is possible to translate a speciﬁc

Nix expression into store derivation ﬁles. For instance:

$ nix-instantiate hello.nix

/nix/store/dgrsz4bsk5srrb...-hello-2.7.drv

After executing the nix-instantiate command, it returns a list of store deriva-

tion ﬁles that it has generated from it. By executing nix-store --realise command,

it is possible to execute the build action which is described in a store deriva-

tion ﬁle. For instance:

$ nix-store -realise /nix/store/dgrsz4bsk5srrb...-hello-2.7.drv

/nix/store/1dp59cdvjql2...-hello-2.7

If the build action is ﬁnished, it returns the store path to the resulting

component that it has built.

2.3 N I X O S

Nix is a tool to manage packages and can be used on a wide range of oper-

ating systems, such as various Linux distributions, FreeBSD, Mac OS X and

Microsoft Windows (through Cygwin). Nix has also been used as a basis for

NixOS, which is a GNU/Linux distribution completely managed through Nix

including all operating system speciﬁc parts, such as the kernel and conﬁgu-

ration ﬁles [Dolstra and Löh, 2008; Dolstra et al., 2010].

Due to the fact that Nix stores all packages in a special directory: /nix/store,

the NixOS distribution does not have most of the Filesystem Hierarchy Stan-

dard (FHS) [The Linux Foundation, 2012] directories such as /usr and /lib and

only a minimal /etc and /bin. The /bin directory only contains a symlink: /bin/sh

pointing to the bash shell, as this is required to make the system() library call

to work properly.

Chapter 2. Background: Purely Functional Software Deployment 35

{pkgs, ...}: 16

{

boot.loader.grub.device = "/dev/sda"; 17

fileSystems = [ 18

{ mountPoint = "/";

device = "/dev/sda1";

}

];

swapDevices = [ { device = "/dev/sda2"; } ];

services = { 20

openssh.enable = true;

httpd = {

enable = true;

documentRoot = "/var/www";

adminAddr = "admin@localhost";

};

xserver = {

enable = true;

desktopManager = {

default = "kde4";

kde4 = { enable = true; };

};

displayManager = {

kdm = { enable = true; };

slim = { enable = false; };

};

environment.systemPackages = [

pkgs.mc

pkgs.subversion

pkgs.firefox

];

}

Figure 2.8 conﬁguration.nix: A NixOS conﬁguration ﬁle

2.3.1 NixOS conﬁgurations

NixOS is more than just a Linux distribution managed by Nix. What sets

NixOS apart from conventional Linux distributions is that entire system con-

ﬁgurations are conﬁgured from a declarative speciﬁcation invoking functions

generating conﬁguration ﬁles and building required packages, similar to the

fact that individual Nix packages are built from declarative speciﬁcations. Or-

dinary Linux distributions are often conﬁgured by manually installing pack-

ages and manually editing conﬁgurations ﬁles, such as an Apache web ser ver

conﬁguration.

Figure 2.8 shows an example of a NixOS conﬁguration, capturing proper-

ties of a complete system:

A NixOS conﬁguration is essentially a function providing a reference

to the Nixpkgs collection, through the pkgs function argument. The re-

mainder of the ﬁle is an attribute set, capturing machine speciﬁc prop-

erties.

This attribute deﬁnes conﬁguration settings of the GRUB

bootloader

specifying that the Master Boot Record (MBR) partition is on the ﬁrst

SCSI harddrive.

The ﬁleSystems attribute set deﬁnes properties of ﬁle systems, such as

what their mount points are. In our example, we have speciﬁed that

our root partition corresponds to the ﬁrst partition on the ﬁrst SCSI

harddrive.

19 The swapDevices attribute set contains a list of swap partitions, which is

– in our case – the second partition on the ﬁrst SCSI harddrive.

20 The services attribute enables a number of system ser vices and pro-

vides their conﬁguration properties. In our example, we have enabled

OpenSSH, the Apache Webserver and an X.Org server conﬁgured to use

the KDE Plasma desktop and the KDM login manager

21 Also, system packages are deﬁned here, such as Mozilla Firefox and

Midnight Commander. These packages are automatically added to the

PATH of all users and also appear automatically in the KDE Plasma Desk-

top’s program launcher menu, if .desktop ﬁles are included (speciﬁca-

tions which KDE and GNOME use to deﬁne entries in their program

launcher menu).

By storing a NixOS conﬁguration ﬁle in /etc/nixos/conﬁguration.nix and by run-

ning nixos-rebuild switch, a complete system conﬁguration is built by the Nix

package manager and activated. Because of the distinct properties of the Nix

package manager, we can safely upgrade a complete system; this is because

the complete system conﬁguration and all its dependencies are safely stored

in the Nix store and no ﬁles are overwritten or removed. By running: nixos-

rebuild –rollback switch, we can do a rollback to any previous generation of the

system which has not been garbage collected yet.

Figure 2.9 shows the GRUB bootloader screen of NixOS. The bootloader

makes it possible to pick and boot into any available previous NixOS conﬁg-

uration at boot time.

2.3.2 Service management

As mentioned earlier, NixOS speciﬁcations are used to deploy a complete sys-

tem. The result of building a NixOS conﬁguration is a Nix closure containing

all packages, conﬁguration ﬁles and other static parts of a system. Apart from

ordinary packages, Nix can also be used to do service management, by gener-

ating conﬁguration ﬁles and shell scripts referring to packages [Dolstra et al.,

2005].

http://www.gnu.org/software/grub

Chapter 2. Background: Purely Functional Software Deployment 37

Figure 2.9 The GRUB bootloader of NixOS

{ config, pkgs }:

pkgs.writeText "sshd_config" ’’

SendEnv LANG LC_ALL ... 23

${if config.services.sshd.forwardX11 then ’’ 24

ForwardX11 yes

XAuthLocation ${pkgs.xorg.xauth}/bin/xauth 25

’’ else ’’

ForwardX11 no

’’}

Figure 2.10 Nix expression generating an OpenSSH server conﬁguration ﬁle

Figure 2.10 shows an example Nix expression generating a conﬁguration

ﬁle for the OpenSSH server. The example has been taken from the OpenSSH

example described in the NixOS paper [Dolstra et al., 2010]:

22 In the body of this expression, we invoke the pkgs.writeText function gen-

erating an OpenSSH server conﬁguration ﬁle, called sshd_conﬁg.

23 The last argument of the pkgs.writeText function invocation is a string,

which provides the actual contents of the conﬁguration ﬁle.

In this string, we use antiquoted expressions to optionally enable X11

forwarding, which can be enabled by setting the services.sshd.forwardX11

option in the NixOS conﬁguration ﬁle.

25 If the forwardX11 option is enabled, we provide the location to the xauth bi-

nary by evaluating the function that builds it. As a side effect, xauth gets

SendEnv LANG LC_ALL ...

ForwardX11 yes

XAuthLocation /nix/store/kyv6n69a40q...-xauth-1.0.2/bin/xauth

Figure 2.11 Example result of evaluating the OpenSSH server conﬁguration ﬁle

{config, pkgs, ...}:

{

options = { ... };

config = { ... };

}

Figure 2.12 General structure of a NixOS module

built from source code. Due to laziness of the Nix expression language,

xauth only gets compiled if it is needed.

Figure 2.11 shows a possible result of the Nix expression, if the X11 for-

warding option is enabled. This conﬁguration ﬁle contains an absolute path

to the xauth binary, which resides in a Nix store directory. After building the

conﬁguration ﬁle, Nix identiﬁes the xauth binary as a run-time dependency

(because it scans and recognises the hash code, as explained earlier in Sec-

tion 2.2.5) and ensures that this dependency is present when transferring this

conﬁguration ﬁle to another machine.

2.3.3 NixOS modules

So far, we have shown that we can build packages as well as conﬁguration ﬁles

by using Nix expressions. To make building of a Linux distribution manage-

able and extensible, NixOS is composed of a number of NixOS modules, each

capturing a separate concern of a system. NixOS modules are merged into

a single system conﬁguration, which can be used to build the entire system

through its dependencies.

Figure 2.12 deﬁnes the general structure of a NixOS module. A module is a

function in which the conﬁg argument represents the full system conﬁguration

and the pkgs argument represents the Nixpkgs collection, which can be used

for convenience. In the remainder of the expression, we deﬁne the options

a module provides (deﬁned in the options attribute) and what conﬁguration

settings it provides (deﬁned in the conﬁg attribute).

Figure 2.13 shows an example capturing the structure of the OpenSSH

NixOS module:

26 The OpenSSH module deﬁnes two options: whether to enable the OpenSSH

service and whether to use X11 forwarding.

27 In the remainder of the expression, we conﬁgure the module settings,

which is done by deﬁning the conﬁg attribute set. The mkIf function is

used to check whether the OpenSSH option is enabled in the NixOS

Chapter 2. Background: Purely Functional Software Deployment 39

{ config, pkgs, ... }:

with pkgs.lib;

{

options = {

services.openssh.enable = mkOption {

default = false;

description = "Whether to enable the Secure Shell daemon.";

};

services.openssh.forwardX11 = mkOption {

default = true;

description = "Whether to allow X11 connections to be forwarded.";

};

config = mkIf config.services.openssh.enable { 27

users.extraUsers = singleton 28

{ name = "sshd";

uid = 2;

description = "SSH privilege separation user";

home = "/var/empty";

};

jobs.openssh = 29

{ description = "OpenSSH server"; 30

startOn = "started network-interfaces"; 31

preStart = 32

’’

mkdir -m 0755 -p /etc/ssh

if ! test -f /etc/ssh/ssh_host_dsa_key; then

${pkgs.openssh}/bin/ssh-keygen ...

’’;

exec = 33

let sshdConfig = pkgs.writeText "sshd_config" (elided; see Figure 2.10);

in "${pkgs.openssh}/sbin/sshd -D ... -f ${sshdConfig}";

};

networking.firewall.allowedTCPPorts = [ 22 ];

};

}

Figure 2.13 openssh.nix: NixOS module for the OpenSSH server daemon

conﬁguration ﬁle provided by the user. The mkIf is a "delayed if" func-

tion, because with an ordinary if function, the evaluation of the NixOS

conﬁguration runs into an inﬁnite loop.

28 The extraUsers attribute is used by NixOS to create user accounts on ﬁrst

startup. In our expression, it is used to create a SSH privilege seperation

user, used by OpenSSH for security reasons.

29 The jobs.openssh attribute set can be used to deﬁne an OpenSSH job for

Upstart

, the system’s init daemon, which is used for starting and stop-

ping system services.

http://upstart.ubuntu.com

30 The description attribute gives a description to the Upstart job, which is

displayed in the status overview when running: initctl list

31 The startOn attribute deﬁnes an upstart event. In this case, we have

speciﬁed that the job should be started after the network interfaces have

been started.

32 The preStart attribute deﬁnes a script, which is executed before the actual

job is started. In our job, we generate SSH host key on initial startup.

33 The exec attribute deﬁnes which job we want to run. In our case, we start

the OpenSSH daemon: sshd with the generated sshd_conﬁg ﬁle (described

earlier in Figure 2.10 as parameter, so that it utilises the settings that we

want.

The networking.ﬁrewall attribute states that TCP port 22 must be open in

the ﬁrewall to allow outside connections.

2.3.4 Composing a Linux system

The conﬁg attribute set of every module collectively deﬁnes the full system

conﬁguration, which is computed by merging all modules together. Multiple

deﬁnitions of an option are combined, for example by merging all values into

a single list, which is done for the networking.ﬁrewall.allowedTCPPorts attribute.

The end result may be:

networking.firewall.allowedTCPPorts = [ 22 80 ];

The TCP port 80 value may appear in this list, if the HTTP service is en-

abled.

The build result of a NixOS conﬁguration is a Nix proﬁle (which is called

a NixOS system proﬁle) consisting of symlinks to all static parts of the system,

such as all system packages and conﬁguration ﬁles. A NixOS proﬁle also con-

tains an activation script taking care of all “impure” deployment steps, such

as creating state directories: /var, starting and stopping Upstart jobs repre-

senting system services, and updating the GRUB bootloader so that users can

boot into the new system conﬁguration.

2.4 H Y D R A

Another important Nix application is Hydra, a continuous build and integra-

tion server built on top of Nix [Dolstra and Visser, 2008]. The purpose of

Hydra is to continuously check out source code repositories and perform au-

tomatic builds and tests for those packages. Furthermore, it provides a web

application interface, which can be used to track build jobs and allow end-

users to conﬁgure jobs. An impression of the Hydra web application interface

is shown in Figure 2.14.

Typically, Hydra is attached to a cluster of build machines. As we have

explained earlier, we can also safely delegate builds to other machines in the

Chapter 2. Background: Purely Functional Software Deployment 41

Figure 2.14 A Hydra build queue

network for various reasons. In Hydra, this feature is extensively used to clus-

ter builds in order to reduce build times and to forward a build to a machine

capable of building a package for a particular platfor m. In our SERG Hydra

build environment at TU Delft

, we have a Hydra cluster containing Linux,

Microsoft Windows (Cygwin), Mac OS X (Darwin), FreeBSD and OpenSolaris

machines.

Because Hydra is built on top of Nix, it has several distinct advantages

compared to other continuous integration solutions, such as Hudson [Moser

and O’Brien, 2011] or Buildbot [Buildbot project, 2012]. For example, the

build environment itself is fully managed so that undeclared dependencies

cannot accidentally allow a build or test to succeed or fail, which is undesir-

able because we have no guarantees that a package will work in a different

environment. Especially for system integration tests this is a very strict re-

quirement as side effects affect the reproducibility of test cases, a property

which is difﬁcult to guarantee using conventional deployment tools.

{ nixpkgs }: 35

let

jobs = rec { 36

tarball = 37

{ disnix, officialRelease }:

with import nixpkgs {};

releaseTools.sourceTarball {

name = "disnix-tarball";

version = builtins.readFile ./version;

src = disnix;

inherit officialRelease;

buildInputs = [ pkgconfig dbus_glib libxml2 libxslt dblatex tetex doxygen ];

};

build = 38

{ tarball, system }:

with import nixpkgs { inherit system; };

releaseTools.nixBuild {

name = "disnix";

src = tarball;

buildInputs = [ pkgconfig dbus_glib libxml2 libxslt ]

++ lib.optional (!stdenv.isLinux) libiconv

++ lib.optional (!stdenv.isLinux) gettext;

};

tests = 39

{ nixos, disnix_activation_scripts }:

...

jobs

Figure 2.15 release.nix: A Hydra build expression for Disnix

2.4.1 Deﬁning Hydra build jobs

Figure 2.15 shows an example of a Hydra expression for Disnix, a component

from our reference architecture, which we will cover later in Chapter 4 of this

thesis:

35 In principle, a Hydra expression is a function returning an attribute set.

In the function header, the nixpkgs argument speciﬁes that we need a

checkout of Nixpkgs.

36 In the resulting attribute set, each attribute represents a job. In our

example, we have deﬁned three jobs, namely: tarball, build and tests. Jobs

can also depend on the result of other jobs and must be able to refer to

each other. To make this possible, the rec keyword is used.

37 Each job also deﬁnes a function, which performs Nix builds. The tarball

attribute deﬁnes a function, which generates a source tarball from a

http://hydra.nixos.org

Chapter 2. Background: Purely Functional Software Deployment 43

Subversion checkout of Disnix. This function takes two arguments: disnix

refers to a Subversion source code checkout of Disnix and ofﬁcialRelease

indicates whether the build is an ofﬁcial release. Unofﬁcial releases have

the Subversion revision number appended to their resulting packages

ﬁle names. Essentially, the build process of the tarball job executes the

conﬁgure phase and runs make dist, which generates the source tarball.

38 The build job is used to unpack the generated tarball from the tarball job

and builds it from source code. This function takes two arguments. The

tarball argument refers to a Nix store component containing a generated

source code tarball. The system attribute refers to a system identiﬁer,

specifying for which architecture the component must be built. For ex-

ample, by providing i686-linux, the package is built for a 32-bit Intel Linux

platform.

39 The tests job is used to perform system integration testing of Disn ix. This

function takes three function arguments. The nixos argument refers to

a checkout of the NixOS source code, which we require for integration

testing. The disnix_activation_scripts argument refers to a build of the Dis-

nix activation scripts, which is a required run-time dependency for Dis-

nix. In the remainder of the expression, test cases are speciﬁed, which

we have left out Figure 2.15 for simplicity. Later in Chapter 7 we show

how we can perform distributed system integration tests.

2.4.2 Conﬁguring Hydra build jobs

As with ordinary Nix expressions, the Hydra expression of Disnix deﬁned

in Figure 2.15, does not specify how to compose these jobs and what kind

of dependencies it should use. The function arguments to these jobs can be

provided by conﬁguring job sets using the Hydra web interface.

Figure 2.16 shows an example of a Disnix job conﬁguration:

• The disnix argument is conﬁgured to take the result of a Subversion ex-

port operation, which always checks out the latest revision of Disnix.

• The disnix_activation_scripts argument is conﬁgured to take the latest build

result of the Hydra job that continuously builds it.

• The nixos argument is conﬁgured to take the result of a Subversion ex-

port operation, which always checks out the latest revision of NixOS.

• The nixpkgs argument is conﬁgured to take the result of a Subversion ex-

port operation, which always checks out the latest revision of Nixpkgs.

• The ofﬁcialRelease argument is conﬁgured to false, which means that the

Subversion revision number ends up in the ﬁle name.

• The system argument is a list of system identiﬁers. By specifying a list

of identiﬁers, Disnix gets built for a wide range of platforms, such as

Figure 2.16 Conﬁguring a Disnix jobset through the Hydra web interface

GNU/Linux, FreeBSD, Mac OS X (Darwin), OpenSolaris and Microsoft

Windows (via Cygwin).

• The tarball argument is conﬁgured to take the latest build result of the

tarball job.

2.5 R E L AT E D W O R K

There is a substantial amount of related work on package management. First,

there are many package managers and software installers available working

on both binary and source level. Second, there is a substantial amount of re-

search in techniques to improve the way deployment processes are performed

by such tools.

2.5.1 Tools

Binary package managers

As mentioned earlier, most common package managers on Linux systems,

such as RPM and the Debian package manager, use nominal dependency

speciﬁcations and imperatively install and upgrade packages, often in global

namespaces, such as /usr/lib. These properties have various drawbacks, such

Chapter 2. Background: Purely Functional Software Deployment 45

as the fact that it is hard to ensure that dependency speciﬁcations are correct

and complete, and to safely upgrade systems.

Windows [Microsoft, 2012] and Mac OS X typically use monolithical de-

ployment approaches in which programs, including all their dependencies,

are bundled in a single folder, with a minimum amount of sharing. Only

highly common and trusted components are shared. Such approaches may

produce a signiﬁcant overhead in terms of disk space and the amount of RAM

that program may use.

.NET applications use the Global Assembly Cache (GAC) [Microsoft, 2010a]

to prevent the “DLL hell”, which requires library assemblies to have a strong

name, composed of several nominal attributes, such as the name, version,

culture and a public/private key pair. Although the GAC provides a higher

degree of isolation compared to common deployment tools, its scope is lim-

ited to a ﬁxed number of attributes and library assemblies only. It cannot be

used for native libraries and other types of components, such as compilers.

Source package managers

Apart from the versioning and storage issues, compared to source package

managers, such as FreeBSD ports [FreeBSD project, 2012], Nix has the advan-

tage that builds are pure and reproducible. As a result, it is relatively simple

to provide binary deployment in addition to source deployment in a transpar-

ent way, which is more difﬁcult to achieve with conventional source package

managers, due to all side effects that may occur during a build.

2.5.2 Techniques

The Mancoosi

research project is also focused on investigating package man-

ager techniques and have a number of similar goals to Nix. However, their

view is a bit different compared to the “Nix-way” of deploying systems. For

example, Mancoosi techniques are implemented in common package man-

agers, such as the Debian package manager and RPM.

A number of motivating factors for the Mancoosi project to use conven-

tional package management tools as a basis over unconventional approaches

such as Nix, are disk space requirements, potential security issues with old

versions of packages that may be retained on a system, and compliance with

the Filesystem Hierarchy Standard (FHS), part of the Linux Standards Base

(LSB) [The Linux Foundation, 2010].

Nix and Mancoosi have a slightly different view on the deﬁnition of soft-

ware packages. As we have seen, Nix has an unorthodox notion of a pack-

age, since packages are deployed by direct or indirect function invocations to

stdenv.mkDerivation that builds a package from source code and its given depen-

dencies provided as function arguments. Its result is a Nix store component

with containing ﬁles and directories stored in isolation in the Nix store, which

can never be modiﬁed after the package has been built.

http://www.mancoosi.org

In the Mancoosi project, the deﬁnition of a package is derived from the way

conventional packages are structured. According to Di Cosmo et al. [2008] a

package consists of ﬁles, meta information (such as nominal dependencies

and conﬂicts), and maintainer scripts that imperatively modify systems, by

“glueing” ﬁles from a package to ﬁles on the host system and perform modi-

ﬁcations to the system conﬁguration.

An important challenge are conﬂicts that come with packages deployed by

conventional package managers – there are various reasons why problems

may occur if a combination of packages are installed which are absent other-

wise. A study on package conﬂict defects by Artho et al. [2012] – in which the

authors have analysed Debian and Fedora bug repositories – divides these

conﬂicts into four categories: resource conﬂicts (which occur if a package

provides the same ﬁles or have wrong ﬁle permissions), conﬂicts on shared

data and conﬁguration ﬁles (e.g. by imperatively modifying an Apache web

server conﬁguration resulting in invalid conﬁguration ﬁle), other problems

(e.g. drivers installed as kernel modules), conﬂicts on meta-data (e.g. false

dependencies), and outdated/incorrect information on conﬂicting packages,

i.e. some conﬂicts with other packages have been resolved in newer versions.

Some conﬂicts will not happen in Nix and NixOS – since ﬁles of a package

are always stored in isolation from each other and ﬁles belonging to a package

cannot be automatically removed or adapted after they have been built, we do

not have ﬁle or conﬁguration conﬂicts on static parts of the system. However,

we cannot prevent a package to access the same variable data in the /var folder,

or that a web server uses the same TCP port at runtime.

Given the fact that the dependencies and conﬂicts of packages are known,

which is not always trivial to ensure or to maintain, it is also extremely dif-

ﬁcult to install or upgrade system in such a way that all dependencies are

present and correct and that no packages are in conﬂict. In fact, a study on the

package management problem [Mancinelli et al., 2006] has shown that it is a

NP-complete problem suggesting that it is not doable to solve large instances

of this problem. Fortunately, by using SAT-solvers subsets of these problems

can be efﬁciently solved. For example, a tool called apt-pbo [Trezentos et al.,

2010] has been developed (extending Debian’s Apt) exactly for that purpose.

Similarly, a study has also shown that co-installability of packages [Di Cosmo

and Vouillon, 2011] is a NP-hard problem. Furthermore, there are many more

studies on dependencies and conﬂict problems, such as strong dependen-

cies [Abate et al., 2009] – a package p strongly depends on package q if when-

ever p is installed then package q must also be installed.

Another major difference between Nix and the Mancoosi approach is the

fact that Mancoosi keeps “maintainer scripts” included in packages intact. For

example, a maintainer script of an Apache module may be used to update a

conﬁguration ﬁle of an Apache web server. These scripts make it hard to

ensure reproducible deployment or to roll back an imperative operation. Nix

and NixOS on the contrary, do not use any maintainer scripts, but instead

all static parts of conﬁguration ﬁles are generated and provided as packages

stored in isolation, instead of imperatively adapted.

Chapter 2. Background: Purely Functional Software Deployment 47

Parts of the Mancoosi projects are about developing a transactional upgrad-

ing approach for maintainer scripts, for instance, by simulating the effects of

a script ﬁrst to see whether it will succeed [Cicchetti et al., 2010]. In NixOS, a

host system is declaratively conﬁgured through NixOS modules, generating

conﬁguration ﬁles in a pure manner. As a result, it is trivial to switch to a

newer or older version by merely ﬂipping a symlink of a Nix proﬁle.

Furthermore, the Mancoosi approach obeys various standards commonly

used in Linux distributions, such as the Filesystem Hierarchy Standard (FHS)

part of the Linux Standards Base (LSB). Nix deviates from the FHS (and there-

fore also from the LSB). The main reason is that the FHS layout does not pro-

vide isolation (they require users to store packages in global locations, such

as /usr/lib) and the LSB requires the presence of the RPM package manager,

which may break purity and reproducibility [van der Burg, 2011g]. However,

the disadvantage of deviating from these standards, is that third party binary

software with custom installers must be patched, in order to be deployed

through Nix. For conventional package managers no modiﬁcation is needed.

In Nix and NixOS, we have chosen purity and reliability over compatibility,

although there are several tools and tricks we can use to allow third party

binary software to be deployed.

GoboLinux

– another Linux distribution not obeying the FHS – shares the

same vision that the ﬁlesystem can be used to organise packages in isolation

and to achieve better deployment, although they use a nominal versioning

scheme and do not statically bind packages [van der Burg, 2011a].

2.6 C O N C L U S I O N

In this chapter, we have covered three important foundations of this the-

sis, namely the Nix package manager which is an automated model-driven

package manager, borrowing concepts from purely functional programming

languages, offering various advantages, such as reliable, efﬁcient and repro-

ducible package builds. Moreover, we have covered NixOS, a GNU/Linux

distribution using the Nix package manager, which extends the Nix deploy-

ment features to complete system conﬁgurations. Finally, we have shown

Hydra, a continuous build and integration server built around Nix.

In the following chapter, we deﬁne a reference architecture to fully auto-

mate deployment processes for the latest generations of systems, using the

components described in this chapter, such as the Nix package manger, and

several new components, such as distributed deployment tools, which are

described in the remaining chapters.

http://www.gobolinux.org

A Reference Architecture for Distributed

Software Deployment

A B S T R A C T

Various developments in software engineering have increased the deployment

complexity of software systems, especially for systems offered as services

through the Internet. In this thesis, we strive to provide an automated, re-

liable, reproducible and efﬁcient deployment solution for these systems. Nix

and several Nix applications, such as NixOS and Hydra, provide a partial so-

lution to achieve this goal, although their scope is limited to single systems. In

this chapter, we describe a reference architecture for distributed software de-

ployment that captures properties of a family of dom ain-speciﬁc deployment

systems containing various components implementing various deployment

aspects. The reference architecture can be turned into a concrete architecture

for a domain-speciﬁc deployment tool.

3.1 I N T R O D U C T I O N

In Chapter 1, we have argued that due to various developments in software

engineering, the software deployment process has become very difﬁcult in

terms of complexity, reliability, agility and heterogeneity.

We have explained Nix, NixOS and Hydra in Chapter 2 – a package man-

ager borrowing concepts from purely functional programming languages, a

Linux distribution completely managed through Nix and a continuous build

and integration server built around Nix. These tools provide a partial solution

for these deployment issues.

However, the scope of Nix and NixOS is limited to single systems. They are

not applicable to the systems composed of distributable services implemented

in various programming languages and component technologies, which must

be deployed in a network of machines, having ranging capabilities and prop-

erties.

We have argued in the introduction that systems offered as services through

the internet are more complicated than single systems. Apart from the fact

that more machines must be deployed and components across machines have

dependencies on each other, also several important non-functional deploy-

ment requirements must be met, which are speciﬁc to a domain, such as

reliability, privacy and the licenses under which components are governed.

Some of these aspects cannot be solved in a generic manner.

In this chapter, we describe a reference architecture [Bass et al., 2003; Taylor

et al., 2010] for distributed software deployment, to realise the vision of a

fully automated, distributed, reliable software deployment tool for distributed

systems capable of integrating with the given application domain.

The reference architecture captures useful elements for distributed software

deployment and can be turned into a concrete architecture for a domain-

speciﬁc software deployment tool, which can be used to automatically, re-

liably and efﬁciently deploy a system in a network of machines.

3.2 R E F E R E N C E A R C H I T E C T U R E S

In this thesis, we provide a solution to the software deployment process for the

latest generation of software systems (offered as services through the internet)

by means of a reference architecture. In the introduction we have given a

deﬁnition of a reference architecture by Taylor et al. [2010]:

The set of principal design decisions that are simultaneously applicable to mul-

tiple related systems, typically within an application domain, with explicitly

deﬁned points of variation.

Typically, reference architectures capture elements of an entire family of

systems. In this thesis, we use it to capture elements of an entire family

of domain-speciﬁc deployment tools performing distributed software deploy-

ment. For example, it simultaneously captures aspects of a deployment tool

for service-oriented medical applications as well as a deployment tool for web

applications, although the implementation of custom components may be dif-

ferent and not all components are required to solve a particular deployment

problem for a given domain.

A concrete architecture for a domain-speciﬁc deployment tool can be de-

rived by developing a tool, which incorporates several deployment compo-

nents deﬁned in the reference architecture and by integrating a number of

custom components.

3.3 R E Q U I R E M E N T S

In order to overcome the deployment complexities for service-oriented sys-

tems, a number of functional and non-functional requirements must be met.

3.3.1 Functional requirements

As we have explained, Nix provides a partial solution for the deployment for

service-oriented systems, however its scope must be extended to distributed

environments.

Figure 3.1 shows a small example deployment scenario of a modern service-

oriented system. The grey colored boxes denote machines in a network. As

can be seen, the system is hosted on two machines, which run different op-

erating systems – the left machine represents a 64-bit Linux machine and the

right machine represents a 32-bit Windows machine.

Figure 3.1 An example deployment scenario of a service-oriented software system

Every machine hosts a number of software components, denoted by white

boxes and containers, denoted by white boxes with dotted borders, which are

software components hosting other components, such as web applications,

and makes them available to end-users, e.g. through a network connection.

The arrows denote dependency relationships. A component may have a de-

pendency on a component on the same system, but may also be dependent

on components deployed on other systems in the network.

To deploy these service-oriented systems, Nix and NixOS can be used de-

ploy individual components and entire system conﬁgurations on single ma-

chines. However, the following aspects are not managed:

• Some components have dependencies on components residing on other

machines in the network. To deploy distributable components, we must

ensure that their dependencies are correct and present.

• As with “ordinary” software components, system conﬁgurations may

also have dependencies on machines with a particular system conﬁg-

uration. For example, a ﬁlesystem directory that is hosted on another

machine and shared through a network ﬁlesystem (NFS).

• We also have to deploy software components, that are hosted inside a

container, by modifying a container’s state (e.g. by executing an activa-

tion operation). Currently, NixOS manages state in an ad-hoc manner,

signiﬁcantly complicating deployment of large networks of machines.

To support the latest generation of software systems offered as services, we

also have to deploy distributable com ponents, such as web services, that may

be located on arbitrary systems in a network and may have dependencies on

Chapter 3. A Reference Architecture for Distributed Software Deployment 51

other distributable components. Apart from services, we must also manage

the infrastructure – the system conﬁgurations of all machines in a network.

3.3.2 Non-functional requirements

Apart from functional requirements, the design of the reference architecture

is primarily driven by a number of important quality attributes.

Reliability

Reliability is our most important quality attribute. For reliable deployment sev-

eral aspects are important. The ﬁrst reliability aspect is dependency completeness

– all required dependencies must be present, correct and the system must be

able to ﬁnd them. Furthermore, components have to be activated in the right

order.

The second reliability aspect is immutability. Most conventional deployment

systems imperatively modify systems. For example, they overwrite existing

ﬁles or remove ﬁles, which may belong to another component. Such modiﬁca-

tions may break a system, make it hard to undo the changes and it also makes

it harder to reproduce a conﬁguration in a different environment. In our de-

ployment system, it is important to be pure; we want to store components in

isolation and we never want to modify them imperatively.

The third reliability aspect is atomicity. If we upgrade systems, it is desir-

able to make this operation as atomic as possible; the system should either

be in the new conﬁguration or in the old conﬁguration, but never somewhere

in the middle. During an upgrade, a system is temporarily inconsistent and

may produce errors. It is desirable to hide these characteristics to end users

as much as possible. Furthermore, we also want to be able to roll back, if the

new conﬁguration does not behave as expected.

The fourth reliability aspect is recovery. After a system has been deployed,

it is not guaranteed to stay operational as various events may occur. For

example, if a system has been deployed in a distributed environment, network

links may temporarily fail or machines may crash and disappear from the

network. In such cases, redeployment is required, which is not always a

trivial process. Therefore, it is desired to provide a deployment solution that

automatically redeploys a system in such cases.

Reproducibility

As we have explained earlier in Chapter 1, apart from the deployment com-

plexities that we have to overcome, we also want to deploy systems more

frequently and in test environments, so that systems can be tested before they

are deployed in production environments. In order to be sure that a sys-

tem works in a production environment as expected, a system conﬁguration

should be reproducible; we want a system to behave exactly as developers

have intended in the test environments.

Genericity

As explained earlier, modern software systems are distributed and heteroge-

neous. Components implementing a service are implemented in various pro-

gramming languages, various component technologies and are deployed on

machines having different architectures and system resources. The deploy-

ment tools realised from this reference architecture must have the ability to

cope with various types of components and systems in a generic manner.

Extensibility

As explained earlier, not every deployment aspect can be solved in a generic

manner. For example, heterogeneous systems may use various communica-

tion and discovery protocols, are composed of various types of services and

they have various deployment quality attributes.

Quality attributes are speciﬁc to a domain. For example, the notion of

reliability in a hospital environment may be different than reliability in the

ﬁnancial domain. Furthermore, certain quality attributes are more important

in one domain than another. Therefore, this functionality should be provided

by the developers realising a concrete architecture for a deployment tool and

these extensions must be easily integrated.

Furthermore, not all components from the reference architecture may be

required to automate a deployment process for a speciﬁc domain. For exam-

ple, in certain scenarios it may be more desirable to manage the infrastructure

by other means, e.g., because it is a proprietary operating system, which can-

not be supported through Nix-based tools. Also, infrastructure deployment

may be sufﬁcient, without having an explicit notion of service components,

because systems may be homogeneous, such as database clusters.

Efﬁciency

The last quality attribute of importance is efﬁciency. Software systems tend

to become larger and more complicated. As a result, an inefﬁcient deploy-

ment process may take a large amount of time. Therefore, it is desirable to

make deployment processes as efﬁcient as possible; only what is required to

do, should be done. However, efﬁciency may not break the previous quality

attributes, as they are more important.

3.4 C O M P O N E N T S

From our requirements and deployment tools that we have described in the

previous chapter, our reference architecture provides a number of compo-

nents, each providing a solution for a speciﬁc deployment aspect. These

components manifest themselves as tools and are covered in the remaining

chapters of this thesis.

Chapter 3. A Reference Architecture for Distributed Software Deployment 53

3.4.1 Nix: Deploying immutable software components

Nix serves as the fundamental basis for this research and provides automated,

reliable deployment for static or immutable components on single systems.

Deployment with Nix is model-driven. For common components, such as

compilers, libraries and end-user applications, a repository of Nix packages

called Nixpkgs can be used, which provides expressions for many frequently

used packages found on many Linux systems. Nix has been described earlier

in Chapter 2.

3.4.2 NixOS: Deploying complete system conﬁgurations

As explained earlier in Chapter 2, NixOS uses the Nix package manager to de-

ploy complete system conﬁgurations from a NixOS conﬁguration ﬁle. NixOS

serves as the basis of our distributed infrastructure deployment tools.

3.4.3 Disnix: Deploying services in networks of machines

Nix is designed for deployment of software components on single systems

and management of local dependencies (which we call intra-dependencies).

Disnix extends Nix with additional tools and models to deploy services in

a network of machines and manages inter-dependencies — dependencies be-

tween services, which may be located on different machines in the network.

Disnix is covered in Chapter 4. Disnix is also capable of (almost) atomically

upgrading a service-oriented system. This aspect is covered in Chapter 8.

3.4.4 Dynamic Disnix: Self-adaptive deployment of services in networks of machines

After a service-oriented system has been deployed in a network of machines,

various events may occur, such as a network failure or crash, which may

render a software system (partially) unavailable for use, or may signiﬁcantly

downgrade its performance. In such cases, a system must be redeployed.

In order to redeploy a system with Disnix, the deployment models must be

adapted to reﬂect a new desirable deployment conﬁguration, which can be

very labourious and tedious, especially if it also desirable to meet various

technical and non-functional requirements.

Dynamic Disnix provides additional features to make service-oriented sys-

tems self-adaptable from a deployment perspective. A discovery service dy-

namically generates an infrastructure model. A distribution model can be

generated from a QoS policy model, composed of chained ﬁlter functions,

which use deployment planning algorithms to map services to machines in

the network, based on technical and non-functional properties deﬁned in the

service and infrastructure models. Dynamic Disnix is covered in Chapter 5.

3.4.5 DisnixOS: Combining Disnix with complementary infrastructure deployment

Disnix performs deployment of service components, but does not manage

the underlying infrastructure. For example, application servers and database

management systems (DBMS) still have to be deployed by other means. Dis-

nixOS provides complementary infrastructure deployment to Disnix, using

features of NixOS and Charon, so that both the underlying infrastructure and

services can be automatically deployed together. DisnixOS is described in

Chapter 6.

3.4.6 Dysnomia: Deploying mutable software components

Because Nix borrows purely functional programming language concepts, not

all types of components can be properly supported, such as databases. In

this thesis, we call these types of components mutable software components.

Currently, in Nix-related applications mutable components are still deployed

on an ad-hoc basis, which can be very inconvenient in networks of machines.

Dysnomia supports automated deployment of mutable software components.

Deployment steps of mutable components cannot be done generically, because

they have many forms. A plugin system is used to take care of the deployment

steps of each component type. Dysnomia provides a repository of modules for

a number of component types (Dysnomia modules) and can be easily extended.

Dysnomia is covered in Chapter 9.

3.4.7 grok-trace: Tracing source artifacts of build processes

As we have explained in the introduction, the licenses under which compo-

nents are governed is an important non-functional deployment requirement,

as not obeying these licenses properly may result in legal actions by the copy-

right holders. The grok-trace tool can be used in conjunction with Nix, in order

to dynamically trace build processes, produce a graph out of these traces to

determine how a build artifact, such as a binary, is produced. This informa-

tion can be used in conjuncion with a license analyser to determine under

what licenses the inputs are covered and how these inputs are combined into

the ﬁnal product. grok-trace is covered in Chapter 10.

3.5 A R C H I T E C T U R A L PAT T E R N S

The components in the previous section automate several deployment aspects

and must be combined in a certain way to provide a fully automated deploy-

ment solution for a particular domain, having our desired quality attributes.

In this section, we describe the architectural patterns that we use for this pur-

pose.

Chapter 3. A Reference Architecture for Distributed Software Deployment 55

3.5.1 Layers

The main purpose of the reference architecture is to support distributed de-

ployment. We have identiﬁed two distributed deployment concerns, namely

distributed service deployment and distributed infrastructure deployment.

Domain-speciﬁc deployment tools typically have to address either or both

of these concerns. The result of deployment actions is typically a collection

of mutable and immutable components, which must be deployed by the Nix

and Dysnomia deployment mechanisms.

This observation shows a clear distinction between concerns in an ordered

fashion. To create a clear dependence structure, we have used an architectural

pattern from [Taylor et al., 2010] called virtual machines in which groups of

components implementing deployment aspects are layered in a strict manner.

In our reference architecture (shown later in Figure 3.2), we have three layers

capturing four concerns:

• The bottom layer concerns the deployment of individual software com-

ponents on single machines.

• The middle layer provides distributed deployment facilities built on top

of the tools provided in the layer underneath. Distributed deployment

concerns two aspects. First, systems may be composed of interconnected

distributable components (also called services in this thesis). Second,

in order to deploy service-oriented systems we also have to provide

infrastructure, consisting of machines with the right operating systems,

system services and capabilities.

• The top layer contains high-level deployment tools for applications, which

invoke tools from the underlying layer to provide distributed service

and infrastructure deployment.

3.5.2 Purely functional batch processing style

Apart from an architecture that groups concerns in separate layers, we also

have to combine the individual components. Traditionally in the UNIX world,

most deployment tasks are provided by command-line tools, which can be

called by end-users or from scripts that combine several tools to automate

entire procedures.

We have chosen to adopt ideas of this style – which is also known as the

batch processing architectural pattern by Taylor et al. [2010] – for the following

reasons:

• A user should be able to perform steps in the deployment process sep-

arately. For instance, a user may want to build all the components de-

ﬁned in the models to see whether everything compiles correctly, while

he does not directly want to activate them in the network.

• Testing and debugging is relatively easy, since everything can be in-

voked from the command line by the end user. Moreover, components

can be invoked from scripts, to easily extend the architecture.

• Components are implemented with different technologies. Some tools

are implemented in the Nix expression language, because they are re-

quired to build components by the Nix package manager. Other tools

are implemented in the C and C++ programming languages for efﬁ-

ciency and others are shell scripts, which can be used to easily glue

various tools together.

• Prototyping is relatively easy. A tool can be implemented in a high level

language ﬁrst and later reimplemented in a lower level language.

These principles are very common for developing UNIX applications and

inspired by Raymond [2003].

The ordinary batch processing architectural pattern has a number of draw-

backs in achieving our desired quality attributes. For example, the results of

processes are non-deter ministic and have side-effects, i.e. invoking the same

operation multiple times may yield different results. This also makes it hard

to reproduce the same execution scenario. Another big drawback is that all

the operations are executed sequentially and are hard to execute in parallel –

due to side effects –, limiting the efﬁciency of deployment procedures.

In order to overcome these limitations, we propose our own variant on this

architectural pattern, called purely functional batch processing. This architectural

pattern has not yet been speciﬁed in the literature and neither is the purely

functional style, although Taylor et al. [2010] lists “Object-Oriented” (origi-

nating from a different class of programming languages) as an architectural

pattern.

In the purely functional batch processing style, we execute arbitrary pro-

cesses in isolated environments, in which side effects are removed, e.g. by

restricting ﬁle system access and clearing environment variables, so that only

explicitly speciﬁed dependencies can be found. Furthermore, processes al-

ways produce their results in a cache on the ﬁle system so that it only has

to be executed once, and can never interfere with other processes running in

isolated environments

The Nix builder environment and the Nix store – described earlier in Chap-

ter 2 – are means to implement such a purely functional batch processing

style. However, the implementation is not restricted to Nix. In the following

chapters, we will see that this pattern also applies to remote upgrades, assum-

ing that the remote machine can be trusted. Later in Chapter 9 we will also

show a different approach for mutable software components.

These measures give a relatively high degree of purity, but there are always situations in

which side-effects can be triggered, e.g. by generating timestamps or random numbers. However,

as we have seen in Chapter 2 this is not a problem for most tools that we use in automating

deployment.

Chapter 3. A Reference Architecture for Distributed Software Deployment 57

The purely functional batch processing pattern is a crucial means to achieve

our desired quality attributes. Advantages of this pattern are that we have rel-

atively little chance on side-effects, which ensures reliability (because depen-

dencies must always be explicitly deﬁned and yields errors if such speciﬁca-

tions are incomplete) and reproducibility (because no unspeciﬁed dependency

affects our results and are always executed deterministically).

Furthermore, because the results are immutable, we can also cache them

and return a result from a persistent cache if the function has been executed

before, providing efﬁciency. Furthermore, because we can support laziness

from purely functional languages, we only have to perform what is really

necessary and we can easily parellise tasks because the dependency closures

are always known.

The genericity and extensibility attributes are achieved, because we are

allowed to invoke arbitrary processes (which can be implemented in any pro-

gramming language) in the isolated builder environments.

Some quality attributes cannot be immediately achieved through this pat-

tern. For example, to achieve full distributed atomicity, we have some extra

requirements on the system that must be deployed, as we will see in Chap-

ter 8. Recovery is a quality attribute that must be implemented by Disnix and

Charon. We will extensively cover this aspect in Chapter 5.

3.6 A P P L I C AT I O N S

By using the components in the reference architecture and the architectural

patterns, we can implement domain-speciﬁc deployment solutions and we

can deploy systems in various environments, ranging from test environments

to production environments.

3.6.1 Hydra: Continuous building and integration

Hydra, the Nix-based continuous build and integration server described in

Chapter 2, is not part of the reference architecture, as it is not implementing

a deployment concern. Instead, it is a tool orchestrating build jobs. Since the

reference architecture builds around Nix, the resulting domain-speciﬁc de-

ployment solution can also be invoked non-interactively from a Hydra build

job. In Chapter 7, we use Hydra to automatically perform system integration

tests.

3.6.2 webdsldeploy: Deploying WebDSL applications in networks of machines

WebDSL [Visser, 2008] has been used as a case study for supporting auto-

mated distributed deployment of WebDSL applications from a high-level of

abstraction. WebDSL is a case study in domain speciﬁc language engineering

and provides various sublanguages capturing various concerns of web appli-

cations. From a WebDSL speciﬁcation, a Java web application is generated.

Figure 3.2 A reference architecture for distributed software deployment

The webdsldeploy tool makes it possible to declaratively deploy a WebDSL

application in a network of machines, which can be hosted in both physical

networks and virtual networks of machines hosted in the environment of an

IaaS provider. The WebDSL function generates a NixOS network speciﬁca-

tion which can be used with charon and the NixOS test driver. webdsldeploy is

described in Chapter 11.

3.7 A N A R C H I T E C T U R A L V I E W O F T H E R E F E R E N C E A R -

C H I T E C T U R E

Figure 3.2 shows an architectural view of our reference architecture for dis-

tributed software deployment:

• The layered surfaces denote clusters of deployment aspects, which cor-

respond to the layers we have described in Section 3.5.1.

• The ovals denote the components described in Section 3.4, which are

implemented as collections of command-line tools and Nix functions.

• The labeled boxes denote models that can be provided by end-uses or

generated by other components in the reference architecture. Generated

models are produced in a purely functional manner and also cached in

the Nix store for efﬁciency.

Chapter 3. A Reference Architecture for Distributed Software Deployment 59

• The arrows denote dependency relationships between these components.

Dependencies result in interactions between these components using the

purely functional batch processing architectural pattern.

• In the top layer – the application layer – we have included an abstract cus-

tom application deployment tool, which can utilise components from the

lower layers to automate deployment for a particular domain. Although

not strictly part of our reference architecture, the webdsldeploy tool has

been included in the top layer as well

3.8 R E L AT E D W O R K

The idea of designing software architectures to create software deployment

automation solutions is not entirely new. A prominent example is described

by Eilam et al. [2006] (and related papers from the IBM Alpine project describ-

ing various deployment tasks [Konstantinou et al., 2009; El Maghraoui et al.,

2006]),

Their approach is mainly focused on interaction among stakeholders in

a deployment process, such as software architects, developers and system

administrators, who each write their own speciﬁcations capturing their de-

ployment concerns. These models are stored in a central repository. Various

model-to-model transformation tools are used to synthesise these stakeholder

models and to generate executable speciﬁcations that various deployment

tools can use to automate deployment tasks.

However, the Alpine architecture is not focused on completely automating

deployment or to achieve our desired quality attributes, such as reliability and

reproducibility. For example, deployment tools are invoked in ad-hoc manner

and can easily produce side-effects and destructively upgrade systems

There are more general deployment solutions capable of completely au-

tomating software deployment processes and capable of incorporating exter-

nal tools. Cfengine [Burgess, 1995] and related tools, such as Puppet [Puppet

Labs, 2011] and Chef [Opscode, 2011], can modify any ﬁle and invoke any

process on a remote machine. However, these tools execute operations in a

convergent manner, in which the end result of an imperative operation is de-

scribed and only the steps that are required to reach it are performed.

Although nobody has derived architectural concepts from these tools, we

identify the architectural pattern used to combine tools in Cfengine (and fam-

ily) as “convergent batch processing”. This architectural pattern achieves a num-

ber of similar goals to the purely functional batch processing pattern, such as

genericity, since any ﬁle and tool can be invoked, and efﬁciency, as only oper-

ations are done that must be done. However, it does not ensure the reliability

aspect, as dependencies are not guaranteed to be complete and imperative

Developers of application deployment tools utilising WebDSL may want to use abstractions

on top of webdsldeploy.

In theory, it may also be possible to adapt the architecture to generate Nix expressions

and use our tools to deploy systems. However, in our reference architecture the desired quality

attributes come for free.

operations may destructively update system conﬁgurations. Furthermore, op-

erations can also affect reproducibility of system conﬁgurations, because they

depend on the machine’s current state.

The software dock [Hall et al., 1999] uses agents to implement a cooperative

deployment approach between suppliers (through release docks) and clients

(through ﬁeld docks) and uses speciﬁcations in a common language, named

Deployable Software Descriptions (DSD). Although these agents implement

all the required software deployment operations, such as release and install, to

automate an entire deployment process, it is up to the agent’s implementation

to achieve intended quality attributes, such as reliability and reproducibility.

The agents are not prohibited, for example, to imperatively or destructively

update components on a particular system.

The Mancoosi Package Manager (MPM) [Abate et al., 2013] is a modular

package manager architecture with an emphasis on the upgrade component

that is considered the most difﬁcult aspect of a package manager by the au-

thors. The MPM upgrade component can be reused in many package man-

agers (such as RPM or dpkg) as well as other software installers, such as the

Eclipse plugin manager and uses SAT-solvers to solve the problem of having

all the required dependencies present and that no conﬂicts occur. Moreover,

the upgrade component can also be hosted externally, such as a data-center,

and uses the Common Upgradeability Description Format (CUDF) as inter-

change format of upgrade data.

Build tools, such as Make [Feldman, 1979] and Apache Ant

, also allow

developers to invoke arbitrary tools and to parallelise tasks. However, they

allow side-effects to be programmed as well as ﬁles to be imperatively modi-

ﬁed, affecting reliability and reproducibility.

Practitioners solving deployment problems typically create shell scripts

chaining a collection of operations together. These scripts use the ordinary

batch processing architectural pattern and have many drawbacks, such as the

fact that any operation can imperatively and destructively modify a system

and that operations have side effects affecting the reproducibility, because

they have access to the complete ﬁlesystem and depend on the ﬁlesystem’s

current state. Furthermore, it is also difﬁcult to parallelise deployment tasks

or to cache inter mediate results, to make deployment processes more efﬁcient.

3.9 C O N C L U S I O N

In this chapter, we have described a number of important functional attributes

as well as non-functional quality attributes for a family of domain-speciﬁc

deployment systems. From these properties, we have derived a reference ar-

chitecture for distributed software deployment capturing various deployment

aspects. The purely functional batch processing architectural pattern allows

us to achieve a number of important quality attributes. The reference archi-

tecture can be transformed into a concrete architecture for a domain-speciﬁc

http://ant.apache.org

Chapter 3. A Reference Architecture for Distributed Software Deployment 61

deployment tool by creating a tool invoking deployment components and by

integrating custom components that implement domain-speciﬁc features.

In the following chapters, we describe the individual components, general

concepts and applications of the reference architecture. In Chapter 13, we

evaluate how well we have achieved our desired quality attributes.

Part II

Service Deployment

Distributed Deployment of Ser vice-Oriented

Systems

A B S T R A C T

Deployment of a service-oriented system in a network of machines is often

complex and labourious. In many cases, components implementing a service

have to be built from source code for a speciﬁc target platform, transferred to

machines with suitable capabilities and properties, and activated in the right

order. Upgrading a running system is even more difﬁcult as this may break

the running system and cannot be performed atomically. Many approaches

that deal with the complexity of a distributed deployment process only sup-

port certain types of components or speciﬁc environments, while general so-

lutions lack certain desirable non-functional properties, such as dependency

completeness and the ability to store multiple variants of components safely

next to each other. In this chapter, we explain Disnix, a deployment tool

which allows developers and administrators to reliably deploy, upgrade and

roll back a service-oriented system consisting of various types of components

in a heterogeneous environment using declarative speciﬁcations.

4.1 I N T R O D U C T I O N

The service-oriented computing (SOC) paradigm is nowadays a very popular

way to rapidly develop low-cost, interoperable, evolvable, and massively dis-

tributed applications [Papazoglou et al., 2007]. The key in realising this vision

is the Service Oriented Architecture (SOA) in which a system is decoupled

into “services”, which are autonomous, platfor m-independent entities that

can be described, published, discovered, and loosely coupled and perform

functions ranging from answering simple requests to sophisticated business

processes.

While a service-oriented system provides all kinds of advantages, such as

integrating a system into the Internet infrastructure, deploying it is often very

labourious and error-prone, because many deployment steps are performed

manually and in an ad-hoc fashion. Upgrading is even more complex. Re-

placing components may break the system, and upgrading is not an atomic

operation; i.e. while upgrading, it is observable from the outside that the sys-

tem is changing. This introduces all kinds of problems to end users, such as

error messages or inaccessible features.

Because the software deployment process of a service-oriented system in a

network is very hard, this is also a major obstacle in reaching their full poten-

tial. For instance, a system administrator might want to run every web service

of a system on a separate machine (say, for privacy reasons), but refrain from

doing so because it would take too much deployment effort. Similarly, actions

such as setting up a test environment that faithfully reproduces the produc-

tion environment may be very expensive.

Existing research in dealing with the complexity of the software deploy-

ment process in distributed environments is largely very context-speciﬁc; e.g. de-

signed for a particular class of components [Rutherford et al., 2002; Akkerman

et al., 2005] or speciﬁc environments (such as Grid Computing [Caron et al.,

2006]). Other approaches illustrate the use of component models and lan-

guages [Chen and Simons, 2002; Costanza, 2002] which can be used to make

deployment easier. While existing research approaches provide useful fea-

tures to reduce the complexity of a particular software deployment process

and also support various non-functional aspects, few of them deal with com-

plete heterogeneous systems consisting of multiple platforms, various types

of components and complete dependencies, i.e. services and all their depen-

dencies including their infrastructure, such as database back-ends of which

service-oriented systems may be composed.

The contribution of this chapter is a deployment tool, Disnix, which uses

models of the components of a system including all its dependencies, along

with the machines in the heterogeneous network in which the system is to be

deployed. These models are then used to automatically install or upgrade the

complete system efﬁciently and reliably, to support various types of services

and protocols by using an extensible approach, and to upgrade or rollback the

system almost atomically.

As a case study, we have used Disnix to automate the deployment of SDS2,

a SOA-based system for asset tracking and utilization services in hospital

environments developed at Philips Research. The goal of this case study is to

evaluate the feasibility of the proposed approach.

4.2 M O T I VAT I O N : S D S 2

As main case study, we have used a SOA-based system developed at Philips

Research, called the Service Development Support System (SDS2) [de Jonge

et al., 2007]. SDS2 is a platform that provides data abstractions over huge data

sets produced by medical equipment in a hospital environment.

4.2.1 Background

Medical devices produce very large amounts of data, such as status logs,

maintenance logs, patient alarms, images, location events and measurements.

These data sets often have poorly-deﬁned structures, e.g. most of these data

sets are represented as logﬁles without any clear structure. SDS2 is a plat-

form providing services which extract useful data from these implicit data

sets and transforms them into useful information. Furthermore, this data can

be presented to various stakeholders in a hospital environment.

Figure 4.1 Transformers architecture

In SDS2, web services provide the role of transformers, creating abstraction

layers for the data sets stored in the data repositories, shown in Figure 4.1.

By combining various transformers, data sets can be turned into useful and

concrete information with a well-deﬁned interface. Data abstractions, such

as workﬂow analysis results, can be presented by one of the web application

front-ends to the stakeholders.

4.2.2 Features

SDS2 provides two web application front-ends serving the following stake-

holders:

• Nurse. The nurse mainly uses SDS2 to check the location and statuses

of medical equipment located on a particular ﬂoor in a hospital. For

example, if a device reports an alarm, the nurse wants to investigate the

situation as quickly as possible.

• Field service engineer. The ﬁeld service engineer is responsible for main-

tenance of medical equipment. Apart from the locations and statuses

of medical devices, the ﬁeld service engineer also wants to know what

events have happened in the past (e.g. check if it was broken) and how

a particular device is (ab)used. Furthermore, the ﬁeld service engineer

may want to check out the maintenance records of a particular device

and eventually update them.

Chapter 4. Distributed Deployment of Service-Oriented Systems 67

Figure 4.2 The SDS2AssetTracker service

• Hospital administrator. The hospital administrator mainly uses the sys-

tem for workﬂow analyses and to check out how efﬁciently devices are

used.

Asset Tracker service

The Asset Tracker service is mainly used by the nurse and ﬁeld service engi-

neer. An impression of this web application front-end is shown in Figure 4.2.

The Asset Tracker shows a ﬂoor map with medical devices and their actual

real-time statuses. In this application, the end user can switch between ﬂoors,

pick a device from the map, check their reported events and inspect their

maintenance records. In our example, three devices are marked with a red

color symbol indicating that they are broken and require maintenance.

Utilisation service

The Utilisation service is primarily used by the hospital administrator. An

impression of this web application front-end is shown in Figure 4.3.

Similar to the Asset Tracker service, the Utilisation service shows ﬂoor

maps, but instead of showing the actual locations and statuses of devices,

it can be used to query workﬂow analysis results of particular devices (or

groups of devices) over a speciﬁed period of time. In our example, it shows

the intensity of travelled paths of a particular device between rooms. The

green color indicates low intensity, while a red color indicates high intensity.

Figure 4.3 The SDS2Utilisation service

4.2.3 Architecture

Figure 4.4 shows the architecture of SDS2. The architecture consists of three

layers:

• The database layer contains databases storing event records, such as all

the status events (mobileeventlogs) and location events (mobilelocationlogs)

that medical devices produce. Furthermore, many services in the layer

below have to store or cache various records, such as device information

(hospitalinstalledbase) and maintenance records (mobileutilizationlogs). The

ejabberdDevices database is used by the ejabberd XMPP server to allow

devices to broadcast their statuses.

• The transformers layer contains web services, providing a well-deﬁned

interface to implicit data sets (MELogService, MULogService, HIBService),

as well as services which extract and synthesise data sets into more

useful information (ME2MSService), such as workﬂow analysis reports.

The FloorPlanService web service provides ﬂoor maps. Furthermore, there

are batch processes used to simulate status and location events

(SDS2EventGenerator), store events in databases (XMPPLogger) and merge

status and location events (SDS2EventMerger) so that their origin can be

tracked.

• The presentation layer contains the web application front-ends described

in the previous subsection, which can be used by the stakeholders

Chapter 4. Distributed Deployment of Service-Oriented Systems 69

Figure 4.4 Architecture of SDS2. Rectangles denote web services and ovals rep-

resent batch processes. Databases and web applications are represented by im-

ages. Arrows denote dependency relationships.

(SDS2AssetTracker and SDS2Utilisation). The DemoPortal is a web application

that serves as an entry portal to all stakeholders, which redirects them

to the service they want to use.

It is important to notice that all components in Figure 4.4 are distributable

components (called services in this thesis). For example, the mobileeventlogs

database may be hosted on another machine than the SDS2AssetTracker web

application. Components in the architecture typically communicate with each

other using RPC or network protocols, such as SOAP (for web services),

XMPP (with ejabberd) and ordinary TCP connections (for database connec-

tions). It may even be possible that every component is located on a separate

machine, but it is also possible that all components are deployed on a single

machine.

4.2.4 Implementation

SDS2 is a heterogeneous system, as it uses various types of components as well

as various technologies:

• MySQL. All the event logs produced by the devices as well as caches are

stored in MySQL databases.

• Ejabberd. Events that are produced by medical devices are broadcast

through one or more Ejabberd servers.

• Apache Tomcat and Apache Axis2. All the web service components are im-

plemented in the Java programming language using the Apache Axis2

library. For hosting the web application components, Apache Tomcat is

used.

• Google Web Toolkit. The web application front-ends are implemented

using the Google Web Toolkit (GWT) using the Java programming lan-

guage. The GWT toolkit is used to compile Java code to efﬁcient Java-

Script code.

4.2.5 Deployment process

The original deployment process of the SDS2 platform is a labourious process.

First, a global conﬁguration ﬁle must be created, which contains URLs of all

the services of which SDS2 consists. Second, all the platform components

have to be built and packaged from source code (Apache Maven [Apache

Software Foundation, 2010] is used for this). For every machine that provides

data storage, MySQL databases have to be installed and conﬁgured. For every

Ejabberd server, user accounts have to be created and conﬁgured through a

web interface. Finally, the platform components have to be transferred to the

right machines in the network and activated.

Drawbacks

The SDS2 deployment process has a number of major drawbacks. First, many

steps are performed manually, e.g. deploying a database, and are therefore

time-consuming and error prone. Furthermore, up-to-date deployment docu-

mentation is required, which accurately describes how the deployment process

should be performed.

Moreover, some steps have to be performed in the right order; e.g. a web

service that provides access to data in a MySQL database, requires that the

database instance is started before the web service starts. Performing these

tasks in the wrong order may bring the web service in an inconsistent state,

e.g. due to failing database connections that are not automatically restored.

Upgrading is difﬁcult and expensive: when changing the location of a serv-

ice, the global conﬁguration ﬁle needs to be adapted and all the dependent

services need to be updated. Furthermore, it is not always clear what the

dependencies of a service are and how we can upgrade them reliably and

efﬁciently, i.e. we only want to replace the necessary parts without breaking

the entire system. Upgrading is also not an atomic operation; when chang-

ing parts of the system it may be obser vable by end users that the system is

changing, as the web front-ends may return incorrect results.

In heterogeneous networks (i.e. networks consisting of systems with various

architectures) the deployment process is even more complex, because we have

to compile and test components for multiple platforms.

Finally, since the deployment process is so expensive, it is also expensive

to deploy the platform in a test environment and to perform test-cases on

them. The deployment complexity is proportional to the number of target

Chapter 4. Distributed Deployment of Service-Oriented Systems 71

machines in the network – if for one machine c components must be deployed

(including all their dependencies), then for x machines c ×x components must

be deployed at worst case, including all their dependencies.

Complexity

The deployment process of SDS2 is typically performed on a single system,

which is already complicated. There are various reasons why a developer or

system adm inistrator wants to deploy components from Figure 4.4 on multi-

ple machines, rather than a single machine.

For instance, a system administrator may want to deploy the DemoPortal

web application on a machine which is publicly accessible on the Internet,

while data sets, such as the mobileeventlogs, must be restricted from public ac-

cess since they are privacy-sensitive and, in some countries, the law prohibits

to store medical data outside the hospital environment.

It may also be desired that web services performing analysis jobs over data

sets, are located within the Philips Enterprise environment, since Philips has

a more powerful infrastructure to perform analysis tasks and may want to use

analysis results for other purposes.

Moreover, each database may need to be deployed on a separate machine,

since a single machine may not have enough disk space available to store all

the data sets. Furthermore, multiple redundant instances of the same compo-

nent may have to be deployed in a network, to offer load-balancing or failover.

All these non-functional aspects require a distributed deployment scenario,

which signiﬁcantly increase the complexity of the deployment process of

SDS2.

Requirements

To deal with the complexity of the software deployment process of SDS2 and

similar systems, we require a solution that automatically deploys a service-

oriented system in an environment taking dependencies into account. Per-

forming the deployment process automatically is usually less time-consuming

and error-prone. Moreover, with a solution that takes all dependencies into

account we never have a breaking system due to missing dependencies and

we can also deploy the dependencies in the right order derived from the de-

pendency graph. We also want to efﬁciently upgrade a service-oriented system

by only replacing the changed parts and taking the dependencies into account

so that all dependencies are always present and deployed in the right order.

In order to automate the installation and upgrade, we have to capture the

services of a system and the infrastructure in declarative speciﬁcations. Using

these speciﬁcations, we can derive the deployment steps of the system, and

reproduce a previous deployment scenario in an environment of choice, such

as a test environment.

Furthermore, we require features to make the deployment process atomic,

so that we never end up with an inconsistent system in case of a failure and

can guarantee that it is not observable to end users that the system is chang-

ing.

Figure 4.5 Overview of Disnix

Since the underlying implementation of a service could be developed for

various platforms with various technologies, we should be able to build the

service for multiple platforms and integrate with the existing environment.

While there are solutions available that deal with these requirements, few

have a notion of complete dependencies, and most are only applicable within

a certain class of component types or environments and lack desirable non-

functional properties such as atomic upgrades. Therefore, a new approach is

required.

The technology used to implement SDS2 is very common in realising service-

oriented systems. The problems described in this section are also applicable

to other systems using the same or similar technology.

4.3 D I S N I X

To solve the complexities of the deployment process of distributed systems

such as SDS2, we have developed Disnix [van der Burg et al., 2008; van der

Burg and Dolstra, 2010a,d] (http://nixos.org/disnix), a distributed de-

ployment extension to the Nix deployment system [Dolstra et al., 2004; Dol-

stra, 2006].

As explained in Chapter 2, Nix is a package manager providing several

distinct features. It stores components in isolation from each other and pro-

vides a purely functional language to specify build actions. Nix only deals

with intra-dependencies, which are either run-time or build-time dependen-

cies residing on the same machine. Disnix provides additional features on

top of Nix to deal with distributed systems, including management of inter-

dependencies, which are run-time dependencies between components residing

on different machines. In this chapter we give an overview of Disnix and

we show how it can be used to automate the software deployment process of

SDS2.

Chapter 4. Distributed Deployment of Service-Oriented Systems 73

4.3.1 Overview

Figure 4.5 shows an overview of the Disnix system. It consists of the disnix-env

tool that takes three models as input:

• The services model describes the services of which the system is com-

posed, their properties and their inter-dependencies. Usually this spec-

iﬁcation is written by the developers of the system. This model also

includes a reference to all-packages.nix, a model in which the build func-

tions and intra-dependencies of the services are speciﬁed.

• The infrastructure model describes the target machines in the network on

which the services can be deployed. Usually this speciﬁcation is writ-

ten by system administrators or can be generated by using a network

discovery service.

• The distribution model maps the ser vices to machines in the network.

Usually, this speciﬁcation is written by system administrators, or gener-

ated automatically.

As with ordinary Nix expressions, the described models are implemented

in the Nix expression language, a simple purely functional language used to

describe component build actions.

All the remote deployment operations are performed by the DisnixService,

a service that exposes deployment operations through a custom RPC protocol

and must be installed on every machine in the network. All the deployment

operations are initiated from a single machine, called the coordinator.

For example, to deploy SDS2, the developers and administrators must write

instances of the three models capturing the services of which SDS2 is com-

posed and the machines on which the components must be deployed. The

following command then sufﬁces to deploy SDS2:

$ disnix-env -s services.nix -i infrastructure.nix \

-d distribution.nix

This command builds all components of SDS2 from source code for the in-

tended target platforms, including all their intra-dependencies, transfers them

to the selected target machines, and activates all services in the right order.

To upgrade SDS2 or change its conﬁguration, the user modiﬁes the models,

and runs disnix-env again. Disnix will rebuild and transfer any components

that have changed. After the transfer phase, a transition phase is started in

which obsolete services that are no longer in use are deactivated and new

services are activated. (In this phase only the changed parts of the system

are updated taking the inter-dependencies and their activation order into ac-

count.) If the activation of a particular service fails, a rollback is performed in

which the old conﬁguration is restored.

Since the machines in the network could be of a different type than the

machine on which the deployment tool is running, the coordinator might

have to delegate the build to a machine capable of building the service for the

{ intraDep1, intraDep2 }: 40

{ interDep1, interDep2 }:

buildFunction { ... } 42

Figure 4.6 Structure of a Disnix expression

target platform (e.g. an i686-linux machine cannot compile a component for an

i686-freebsd system). When there is no dedicated build machine available for

this job, Disnix also provides the option to perform the build on the selected

target machine in the network. Disnix performs actions on remote machines

(such as transferring, building and activating) through a remote service, the

DisnixService, that must be installed on every target machine.

Usually, there is only one coordinator machine involved in deployment

processes. However, it is also possible to have multiple coordinators, which

can either deploy the same conﬁguration or a different conﬁguration (which

is made distinct by a proﬁle identiﬁer). Because of the purely functional de-

ployment model of Nix, we can safely deploy multiple variants of packages

next to each other without interference. However, at runtime it may be pos-

sible that a process from one conﬁguration may conﬂict with another process

from a different conﬁguration. Disnix does not keep track of these runtime

conﬂicts between different conﬁgurations, and in case of a runtime error, it

performs a rollback because the activation process fails. A system admin-

istrator has to manually check whether no conﬂicts occur. For reliability, it

is not recommended to deploy multiple conﬁgurations in the same environ-

ment. As a possible solution, two conﬁgurations can be merged into a single

conﬁguration and then deployed by Disnix – because all dependencies are in

the models, they can be statically veriﬁed.

4.3.2 Building a service

For every service deﬁned in the architecture shown in Figure 4.4, we have

to write a Disnix expression, analogous to the fact that for every Nix pack-

age we have to write a Nix expression deﬁning how it can be derived from

source code and its dependencies. Furthermore, the component must also

be conﬁgured to be able to connect to its inter-dependencies, for example by

generating a conﬁguration ﬁle containing connection settings in a format that

the component understands.

The general structure of a Disnix expression is outlined in Figure 4.6. The

structure of a Disnix expression is almost the same as an ordinary Nix expres-

sion. A Nix expression is typically a function, whereas a Disnix expression is

a nested function deﬁned in the Nix expression language:

40 The outer function header deﬁnes the intra-dependency arguments, which

are build-time and run-time dependencies residing on the same sys-

tem where the component is deployed. These dependencies could be

libraries, compilers and conﬁguration ﬁles.

Chapter 4. Distributed Deployment of Service-Oriented Systems 75

{javaenv, config, SDS2Util}: 43

{mobileeventlogs}:

let

jdbcURL = "jdbc:mysql://"+

mobileeventlogs.target.hostname+":"+

toString mobileeventlogs.target.mysqlPort+"/"+

mobileeventlogs.name;

javaenv.createTomcatWebApplication rec {

name = "MELogService";

contextXML = ’’ 46

<Resource

name="jdbc/sds2/mobileeventlogs" auth="Container"

type="javax.sql.DataSource"

username="${mobileeventlogs.target.mysqlUsername}"

password="${mobileeventlogs.target.mysqlPassword}"

url="${jdbcURL}" />

</Context>’’;

webapp = javaenv.buildWebService rec { 47

inherit name;

src = ../../../WebServices/MELogService; 48

baseDir = "src/main/java";

wsdlFile = "MELogService.wsdl";

libs = [ config SDS2Util ]; 49

};

}

Figure 4.7 Build expression of MELogService

41 The inner-function header deﬁnes the inter-dependency arguments, which

are run-time dependencies on services which may be located on differ-

ent machines in the network, such as web ser vices or databases.

Every inter-dependency argument is a set of attributes, containing var-

ious properties of a service deﬁned in the services model, shown later

in Section 4.3.4. A special attribute of an inter-dependency is the tar-

gets property, which contains a list of machine properties on which the

inter-dependency is deployed. These machine properties are deﬁned

infrastructure model, shown later in Section 4.3.5. The target attribute

points to the ﬁrst targets element, which can be used for convenience.

42 In the remainder of the Disnix expression, the component is built from

source code, and the given intra-dependencies and inter-dependency

parameters.

Figure 4.7 shows a Nix expression for the MELogService service component

of SDS2, representing an Apache Tomcat web application containing a web

service archive built from Java source code. As we have explained earlier in

Section 4.2.3, the MELogService is a service providing an interface to log records

stored in a MySQL database, which can reside on a different machine. In

order to allow the service to connect to the database server, the expression

generates a so-called context XML ﬁle that speciﬁes connection settings of the

MySQL database:

43 The outer function – representing the intra-dependencies – speciﬁes the

conﬁg and SDS2Util components, which are libraries of the SDS2 platform,

and javaenv, which is used to compile and package Java source code.

44 The inner function, deﬁning inter-dependencies, speciﬁes mobileeventlogs

parameter representing the MySQL database where the data records are

stored. The inter-dependencies correspond to the arrows pointing away

from the MELogService service in Figure 4.4.

45 The remainder of the expression consists of the function call to

javaenv.createTomcatWebApplication, that composes an Apache Tomcat web

application from a Java web service archive (WAR ﬁle) and an Apache

Tomcat context XML ﬁle.

46 The context XML ﬁle is generated by using the target property from the

mobileeventlogs inter-dependency argument, that provides the connection

settings of the database server.

47 The web service archive is derived from Java source code by calling the

javaenv.buildWebService function.

48 The src parameter speciﬁes the location of the source code, which resides

in a directory on the ﬁlesystem.

49 The libs parameter speciﬁes the libraries, which are required to build

the service. These libraries are provided by the intra-dependency argu-

ments.

4.3.3 Composing intra-dependencies

Although a Disnix expression speciﬁes how to derive the component from

source code and its intra- and inter-dependencies, we cannot build this serv-

ice directly. As with ordinary Nix expressions, we have to compose the com-

ponent by calling this expression with the expected function arguments. In

Disnix, we ﬁrst have to compose the service locally, by calling the function

with the needed intra-dependency arguments; later, we have to provide the

inter-dependency arguments as well.

Figure 4.8 shows a partial Nix expression that composes the intra-

dependencies of the SDS2 services. As may be observed, the structure of

this expression is quite similar to the all-packages.nix composition expression

described in Section 2.2.3 of the previous Chapter. The subtle difference of

this expression is that it only captures components that belong to SDS2 and

reuses common libraries and build tools, such as Apache Axis2 and the Java

Development Kit, by refering to the Nixpkgs repository:

50 An intra-dependency compostion expression is a function taking three

arguments; The pkgs argument refers to the entire Nixpkgs collection,

providing compositions for many common packages. The system argu-

ments speciﬁes the platform for which a service has to be built, using

Chapter 4. Distributed Deployment of Service-Oriented Systems 77

{pkgs, system, distribution}: 50

rec { 51

MELogService = import ../WebServices/MELogService { 52

inherit javaenv config SDS2Util;

};

config = import ../libraries/config { 53

inherit javaenv distribution;

};

javaenv = import ../tools/javaenv {

inherit (pkgs) stdenv jdk;

};

mobileeventlogs = ...

SDS2Util = ...

...

}

Figure 4.8 all-packages.nix: Intra-dependency composition

identiﬁers such as i686-linux and x86_64-freebsd. The system parameter

has also been passed as an argument to the composition function of the

stdenv component in Nixpkgs (not shown here), which is a component

used virtually by every other component; e.g. passing i686-linux as system

argument to stdenv will build every component for that particular type

of platform. The distribution argument refers to the distribution model

(shown later in Section 4.3.6), so that components can reﬂect over the

current deployment situation.

51 In the remainder of the expression, a recursive attribute set is speciﬁed

in which each value composes components of the SDS2 locally. As with

an ordinary Nix composition expressions, the rec keyword is required,

because components may have to refer to each other.

52 The MELogService attribute refers to the expression described in Fig-

ure 4.7 and is called with the required intra-dependency arguments (all

the intra-dependencies of the MELogService, such as SDS2Util, are com-

posed in this expression as well).

53 The conﬁg attribute composes the conﬁg component. The conﬁg compo-

nent is an intra-dependency of all SDS2 services, providing a registry

library which knows the locations of every web service. By inheriting

the distribution parameter, the entire distribution model is passed to the

build function, so that it can be properly conﬁgured to provide informa-

tion of service locations to all SDS2 services.

{distribution, pkgs, system}: 54

let

customPkgs = import ../top-level/all-packages.nix { 55

inherit pkgs distribution system;

};

rec { 56

mobileeventlogs = {

name = "mobileeventlogs";

pkg = customPkgs.mobileeventlogs;

type = "mysql-database";

};

MELogService = { 57

name = "MELogService"; 58

pkg = customPkgs.MELogService; 59

dependsOn = { 60

inherit mobileeventlogs;

};

type = "tomcat-webapplication";

};

SDS2AssetTracker = {

name = "SDS2AssetTracker";

pkg = customPkgs.SDS2AssetTracker;

dependsOn = {

inherit MELogService ...;

};

type = "tomcat-webapplication";

};

...

}

Figure 4.9 A partial services model for SDS2

4.3.4 Services model

Apart from specifying how each individual service should be derived from

source code and their intra-dependencies and inter-dependencies, we must

also model what services constitute a system, how they are connected to each

other (inter-dependency relationships) and how they can be activated or de-

activated. These properties are captured in a services model, illustrated in

Figure 4.9:

54 Essentially, a services model is a function taking three arguments; The

distribution arguments refers to the distribution model, the system argu-

ment provides the system architecture of a target in the infrastructure

model and the pkgs argument refers to the Nixpkgs collection.

55 Here, we import the composition expression that we have deﬁned ear-

lier in Figure 4.8. In order to allow the composition expression to ﬁnd

the Nixpkgs collection and the desired architecture, we pass them as

function arguments.

Chapter 4. Distributed Deployment of Service-Oriented Systems 79

56 The remainder of the services model is a mutually recursive attribute

set in which every attribute represents a service with its properties. As

with ordinary Nix expressions and the intra-dependency composition

expression, services may also have to refer to each other.

57 The MELogService attribute deﬁnes the MELogService service component

of SDS2. In principle, all the attributes in the services model correspond

to the rectangles, ovals and icons, shown earlier in Figure 4.4.

58 The name attribute speciﬁes an identiﬁer for MELogService.

59 The pkg attribute speciﬁes a function that composes the service from

its inter-dependencies. This function has already been composed lo-

cally (by calling it with the required intra-dependency arguments) in

Figure 4.8.

60 The dependsOn attribute refers to an attribute set in which each attribute

points to a service in the services model, each representing an inter-

dependency of the MELogService component. The attribute set provides a

ﬂexible way of composing services, e.g. mobileeventlogs = othermobileevent-

logs allows the user to compose a different inter-dependency relation-

ship of the MELogService. In principle, these attributes in dependsOn cor-

respond to the arrows, shown earlier in Figure 4.4.

61 The type attribute speciﬁes what module has to be used for activation/de-

activation of the service. Examples of types are: tomcat-webapplication,

axis2-webservice, process and mysql-database. The Disnix interface will in-

voke the appropriate activation module on the target platfor m, e.g. the

tomcat-webapplication will activate the given service in an Apache Tomcat

web application container.

4.3.5 Infrastructure model

In order to deploy services in the network, we also have to specify what ma-

chines are available in the network, how they can be reached to perform de-

ployment steps remotely, what architecture they have (so that services can be

built for that type of platform) and other relevant capabilities, such as au-

thentication credentials and port numbers so that services can be activated or

deactivated. This information is captured in the infrastructure model, illus-

trated in Figure 4.10:

The infrastructure model is an attribute set in which each attribute cap-

tures a system in the network with its relevant properties/capabilities.

This line deﬁnes an attribute set specifying the properties of a machine

called test1.

{

test1 = { 62

hostname = "test1.net";

tomcatPort = 8080;

mysqlUser = "user";

mysqlPassword = "secret";

mysqlPort = 3306;

targetEPR = http://test1.net/.../DisnixService; 64

system = "i686-linux"; 65

};

test2 = {

hostname = "test2.net";

tomcatPort = 8080;

...

targetEPR = http://test2.net/.../DisnixService;

system = "x86_64-linux";

};

}

Figure 4.10 Infrastructure model for SDS2

{infrastructure}: 66

{

mobileeventlogs = [ infrastructure.test1 ]; 67

MELogService = [ infrastructure.test2 ];

SDS2AssetTracker = [

infrastructure.test1 infrastructure.test2

];

...

}

Figure 4.11 Partial distribution model for SDS2

64 Some attributes have reserved use, such as targetEPR, specifying the URL

of the DisnixService which performs remote deployment operations

65 The system attribute is used to specify the system architecture so that

services are built for that particular platform.

63 The other attributes can be freely chosen and capture miscellaneous

technical and non-functional properties of a machine. Further more, all

attributes deﬁned in this model are passed to the activation scripts of

the given type, so that machine speciﬁc properties are known at activa-

tion time. For example, the tomcatPort attribute speciﬁes the TCP port

on which Apache Tomcat listens. This property is used by the Apache

Tomcat activation script to properly activate Apache Tomcat web appli-

cations.

Chapter 4. Distributed Deployment of Service-Oriented Systems 81

4.3.6 Distribution model

Finally, we have to specify to which machines in the network we want to

distribute a speciﬁc service. These properties are deﬁned in the distribution

model illustrated in Figure 4.11:

66 The distribution model is a function taking the infrastructure model as

parameter. The remainder of the expression is an attribute set in which

every attribute name represents a service in the services model and the

attribute value is a list of machines from the infrastructure model.

67 This attribute speciﬁes that the mobileeventlogs database should be built,

distributed and activated on machine test1.

68 By specifying more machines in a list it is possible to deploy multiple

redundant instances of the same service for tasks such as load balanc-

ing. This attribute speciﬁes that the SDS2AssetTracker service must be

deployed on both test1 and test2.

4.4 I M P L E M E N TAT I O N

In the previous section, we have described the structure of the models that

Disnix uses and how Disnix can be used to perform the complete deployment

process of a service-oriented system. In this section, we describe how this

deployment process is implemented.

4.4.1 Deployment process

In order to deploy a system from the models we have described earlier, we

need to build all the ser vices that are mapped to a target machine in the dis-

tribution model for the right target, then transfer the services (and all its intra-

dependencies) to the target machines and ﬁnally activate the services (and

eventually deactivate obsolete services from the previous conﬁguration). The

disnix-env tool, which performs the complete deployment process, is composed

of several tools each performing a step in the deployment process. The archi-

tecture is shown in Figure 4.12.

Building the services

The Disnix models are declarative models. Apart from the build instructions of

the services, the complete deployment process including the order is deﬁned

from these models and their deﬁned intra- and inter dependencies.

In the ﬁrst step of the deployment process, the disnix-manifest tool is invoked

with the three Disnix models as input parameters. The result of this tool is

a manifest ﬁle, shown in Figure 4.13, an XML ﬁle that speciﬁes what services

This attribute is used by the DisnixWebService. Disnix also provides other types of inter-

faces, which use a different attribute. Consult the Disnix manual for more details [van der Burg,

2011e].

Figure 4.12 Architecture of disnix-env. Rectangles represent objects that are ma-

nipulated by the tools. The ovals represent a tool from the toolset performing a step

in the deployment process.

need to be distributed to which machine and a speciﬁcation of the services to

be activated:

69 The distribution section captures the Disnix proﬁles that must be dis-

tributed to each target in the network. Disnix proﬁles are Nix proﬁles

(as described in Section 2.2.6 of the previous chapter) containing sym-

links of the contents of all the services and their intra-dependencies that

should be installed on the given target machine.

70 The activation section captures all the activation properties of each service,

such as the properties of the MELogService component.

73 The name element contents corresponds to the service name deﬁned in

the services model.

74 The service element contents points to the Nix store component, under

which the build result is stored.

75 The target element contains properties of the target machine on which

the service must be activated. These properties are derived from the

infrastructure model.

72 The dependsOn section deﬁnes activation properties of all inter-dependencies

of the given service. In our example, it deﬁnes properties of the mo-

Chapter 4. Distributed Deployment of Service-Oriented Systems 83

<?xml version="1.0"?>

<distribution> 69

<profile>/nix/store/ik19fyzhak5ax4s8wkixg2k4bbkwyn83-test1</profile>

<target>http://test1.net/DisnixService/services/DisnixService</target>

</mapping>

<profile>/nix/store/grq938g52iz3vqly7cn5r9n9l6cjrsp7-test2</profile>

<target>http://test2.net/DisnixService/services/DisnixService</target>

</mapping>

</distribution>

<mapping> 71

<dependsOn> 72

<service>/nix/store/z1i74v0hp0909fvgda4vwx68yprhpqwa-mobileeventlogs</service>

<mysqlPassword>admin</mysqlPassword>

<sshTarget>localhost:2223</sshTarget>

<system>i686-linux</system>

<targetEPR>http://test1.net/DisnixService/services/DisnixService</targetEPR>

</target>

</dependency>

</dependsOn>

<name>MELogService</name>

<service>/nix/store/hg6fy5ngr39ji4q3q968plclssljrb6j-MELogService</service> 74

<target> 75

<sshTarget>localhost:2222</sshTarget>

<system>i686-linux</system>

<targetEPR>http://test2.net/DisnixService/services/DisnixService</targetEPR>

</target>

<targetProperty>targetEPR</targetProperty> 76

<type>tomcat-webapplication</type> 77

</mapping>

...

<mapping> 78

<name>mobileeventlogs</name>

<service>/nix/store/z1i74v0hp0909fvgda4vwx68yprhpqwa-mobileeventlogs</service>

...

</mapping>

</activation>

<target>http://test1.net/DisnixService/services/DisnixService</target>

<target>http://test2.net/DisnixService/services/DisnixService</target>

</targets>

</manifest>

Figure 4.13 A partial Disnix manifest ﬁle

bileeeventlogs database service, which is an inter-dependency of MEL-

ogSer vice.

76 The targetProperty element speciﬁes what property in the infrastructure

model is used to provide the address of the remote Disnix Service. In

/nix/store

nqapqr5cyk...-glibc-2.9

lib

libc.so.6

ccayqylscm...-jre-1.6.0_16

bin

java

n7s283c5yg...-SDS2Util

java

SDS2Util.jar

y2ssvzcd86...-SDS2EventGenerator

bin

SDS2EventGenerator

java

SDS2EventGenerator.jar

Figure 4.14 Runtime dependencies of the SDS2EventGenerator

our case, the targetEPR attribute represents the address.

77 The type attribute speciﬁes of which type the given service is. This prop-

erty is derived from the services model.

78 Likewise, all other services activation mappings are also deﬁned in this

manifest ﬁle, such as the mobileeventlogs database.

79 The targets section lists all the Disnix Service addresses of the target

machines involved in the deployment process. These are used to tem-

porarily lock access in the transition phase.

The manifest ﬁle is basically a concrete version of the abstract distribution

model. While the distribution model speciﬁes which services should be dis-

tributed to which machine, it does not reference the actual service that is built

from source code for the given target machine under the given conditions,

i.e. the component in the Nix store, such as /nix/store/x39a1...-StaffTracker. Be-

cause the concrete Nix store path names have to be deter mined, this results

in a side effect in which all services are built from source code for the given

target platforms. The manifest ﬁle itself is also built in a Nix expression and

stored in the Nix store.

Transferring closures

The generated manifest ﬁle is then used by the disnix-distribute tool. As ex-

plained earlier in Section 2.2.5 of the previous Chapter, Nix can detect run-

time dependencies from a build process, by scanning a component for the hash

codes that uniquely identify components in the Nix store. Because the ELF

header of the java executable contains a path to the standard C library which

is: /nix/store/nqapqr5cyk...-glibc-2.9/lib/libc.so.6, we know by scanning that a speciﬁc

Chapter 4. Distributed Deployment of Service-Oriented Systems 85

glibc component in the Nix store is a runtime dependency of java. Figure 4.14

shows the runtime dependencies of the SDS2EventGenerator service.

The disnix-distribute tool will efﬁciently copy the components and their intra-

dependencies to the machines in the network under the runtime-dependency

relation through remote interfaces. This goal is achieved by copying the clo-

sures of the Disnix proﬁles deﬁned in the manifest ﬁle. Only the components

that are not yet distributed to the target machines are copied, while keeping

components that are already there intact. As explained earlier in Section 2.2,

due to the purely functional properties of Nix, a hash code in a Nix store path

uniquely represents a particular component variant, which we do not have to

replace if it already exists, because it is identical. Only the missing paths have

to be transferred, making the transfer phase as efﬁcient as possible.

Copying Nix components is also a non-destructive process, as no existing

ﬁles are overwritten or removed. To perform the actual copy step to a single

target, the disnix-copy-closure tool is invoked.

Transition phase

Finally, the disnix-activate tool is executed, which performs the transition phase.

In this phase, all the obsolete services are deactivated and all the new services

are activated. The right order is derived from the inter-dependencies deﬁned

in the services model. By comparing the manifests of the current deployment

state and the new deployment state, an upgrade scenario can be derived.

During the transition phase, every service receives a lock and unlock re-

quest to which they can react, so that optionally a service can reject an up-

grade when it is not safe or temporarily queue connections so that end users

cannot observe that the system is changing. While the transition phase is on-

going, other Disnix clients are locked out, so that they cannot perform any

deployment operations, preventing that the upgrade process is disturbed.

Then, it will recursively activate all the service distributions (and its inter-

dependencies) deﬁned in the new conﬁguration that are marked inactive. If

a failure occurs, the newly activated services are deactivated and the deacti-

vated ser vices are activated again. If the deactivation of a particular service

fails, the new services are deactivated and the previously deactivated services

are activated again.

If the transition method has succeeded, the manifest ﬁle generated earlier

is stored in a coordinator Nix proﬁle of the coordinator machine for further

reference. This manifest ﬁle, is used for future deployment operations to

derive the most reliable and efﬁcient upgrade scenario.

Using this method to upgrade a distributed system gives us two beneﬁts.

By traversing the inter-dependency graph of a service we always have the

inter-dependencies of a service activated before activating the service itself,

ensuring that no dependency relationship get broken. Moreover, this method

only deactivates obsolete services and activates new services, which is more

efﬁcient than deploying a new conﬁguration entirely from scratch.

#!/bin/bash -e

catalinaBaseDir=@CATALINA_BASE@

case "$1" in

activate)

# Link all WAR files in the deployment directory, so that they will

# be activated with hot-deployment

find $(readlink -f $2/webapps) -name \

.war | while read i

# Link the web service

ln -sfn "$i" "$catalinaBaseDir/webapps/‘basename "$i"‘"

# Link the configuration files if they exist

if [ -d "$2/conf/Catalina" ]

then

mkdir -p $catalinaBaseDir/conf/Catalina/localhost

for j in $2/conf/Catalina/

ln -sfn "$j" $catalinaBaseDir/conf/Catalina/localhost/‘basename "$j"‘

done

;;

deactivate)

# Remove WAR files from the deployment directory, so that they will

# be deactivated with hot deployment

find $(readlink -f $2/webapps) -name \

.war | while read i

# Remove the web service

rm -f $catalinaBaseDir/webapps/‘basename $i‘

# Also remove the configuration files if they exist

if [ -d "$2/conf/Catalina" ]

then

for j in $2/conf/Catalina/

rm -f $catalinaBaseDir/conf/Catalina/localhost/‘basename "$j"‘

done

;;

esac

Figure 4.15 Activation module for Apache Tomcat web applications

Service activation and deactivation

Since services can have basically any form, Disnix provides activation types

which can be connected to activation modules. In essence, an activation mod-

ule is a process that takes 2 arguments, in which the former argument is either

‘activate’ or ‘deactivate’ and the latter is the Nix store path of the service that

has to be activated/deactivated.

Moreover, an activation module often needs to know certain properties of

the system, such as authentication credentials for a database or a port number

on which a certain daemon is running. Therefore, Disnix passes all the prop-

erties deﬁned in the infrastructure model as environment variables, so that it

can be used by the activation module.

Chapter 4. Distributed Deployment of Service-Oriented Systems 87

<?xml version="1.0"?>

<derivation>/nix/store/nysyc9rjayqi...-MELogService.drv</derivation>

<target>http://test1.net/DisnixService/services/DisnixService</target>

</mapping>

<derivation>/nix/store/8j1qya7yi1hv...-mobileeventlogs.drv</derivation>

<target>http://test2.net/DisnixService/services/DisnixService</target>

</mapping>

...

</distributedderivation>

Figure 4.16 A partial distributed derivation ﬁle

Figure 4.15 shows the code for the activation module for an Apache Tomcat

web application on Linux:

80 This section deﬁnes the activation procedure of an Apache Tomcat web

application. In order to activate an Apache Tomcat web application a

symlink to the WAR ﬁle in the Nix store is created in the webapps/ di-

rectory of Tomcat, automatically triggering a hot deploy operation. Fur-

thermore, also the generated context XML is symlink into the conf/Catali-

na/ directory, so that components requiring a database can ﬁnd their

required settings.

81 This section deﬁnes the deactivation procedure. Here, the symlink to

the WAR ﬁle as well as the context XML ﬁle of a web application are

removed. This operation triggers a hot undeployment operation on

Apache Tomcat.

Similar activation modules are developed for other types, such as axis2-

webservice which will hot deploy a web service in an Axis2 container, mysql-

database which will initialise a MySQL database schema on startup or process

which will start and kill a generic process. A developer or system adminis-

trator can also implement a custom activation module used for other types

of services or use the wrapper activation module, which will invoke a wrapper

process with a standard interface included in the service. (Observe that differ-

ent platforms may require a different implementation of an activation module,

e.g. activating a web service on Microsoft Windows consists of different steps

to be performed than on UNIX based systems).

Distributed building

Disnix provides an optional feature to build services on the target machines

instead of the coordinator machine. If this option is enabled, two additional

modules are used. The disnix-instantiate tool will generate a distributed derivation

ﬁle, as shown in Figure 4.16.

This ﬁle is also an XML ﬁle similar to the manifest ﬁle, except that it maps

Nix store derivation ﬁles (low-level speciﬁcations that Nix uses to build com-

ponents from source code, described earlier in Section 2.2.9) to target ma-

Figure 4.17 Architecture of the DisnixService

chines in the network. Because the distributed derivation ﬁles contains abso-

lute Nix store paths to these store derivation ﬁles (and these store derivation

contain references to all other build-time derivations), they are detected as

runtime-dependencies and copied to the target machines.

It then invokes disnix-build with the derivation ﬁle as argument. This tool

transfers the store derivation ﬁles to the target machines in the network, builds

the components on the target machines from these derivation ﬁles, and ﬁnally

copies the results back to the coordinator machine. The disnix-copy-closure tool

is invoked to copy the store derivations to a target machine and to copy the

build result back.

Finally, it performs the same steps as it would do without this option en-

abled. In this case disnix-manifest will not build every ser vice on the coordinator

machine, since the builds have already been performed on the machines in the

network. Also, due to the fact that Nix uses a purely functional deployment

model, a build function always gives the same result if the input parameters

are the same, regardless of the machine that performed the build.

4.4.2 Disnix Service

Some of the tools explained in the previous subsections have to connect to

the target machines in the network to perform deployment steps remotely,

e.g. to perform a remote build or to activate/deactivate a service. Remote

access is provided by the DisnixService, which must be installed on every

target machine.

Architecture

The Disnix service consists of two major components, shown in Figure 4.17.

We have a component called the core that actually executes the deployment

operations by invoking the Nix package manager to build and store compo-

nents and the activation modules package to activate or deactivate a service.

The wrapper exposes the methods remotely through an RPC protocol of choice,

such as an SSH connection or through the SOAP protocol provided by an ex-

ternal add-on package called DisnixWebService.

There are two reasons why we have chosen to make a separation between

the interface component and the core component:

• Integration. Many users have speciﬁc reasons why they choose a par-

ticular protocol e.g. due to organisation policies. Moreover, not every

Chapter 4. Distributed Deployment of Service-Oriented Systems 89

protocol may be supported on every type of machine.

• Privilege separation. To execute Nix deployment operations, super-user

privileges on UNIX-based systems are required. The wrapper may

need to be deployed on an application server hosting other applications,

which could introduce a security risk if it is running as super user.

Custom protocol wrappers can be trivially implemented in various pro-

gramming languages. As illustrated in Figure 4.17, the Disnix core service

can be reached through the D-Bus protocol [Freedesktop Project, 2010b], used

on many Linux systems for inter-process communication. In order to create a

custom wrapper, a developer has to map RPC calls to D-Bus calls and imple-

ment ﬁle transport of the components. D-Bus client libraries exist for many

programming languages and platforms, including C, C++, Java and .NET and

are supported on many ﬂavours of UNIX and Windows.

Disnix interface

In order to connect to a machine in the network, an external process is in-

voked by the tools on the deployment coordinator machine. An interface can

be conﬁgured by various means, such as specifying the DISNIX_CLIENT_INTER-

FACE environment variable. The Disnix toolset includes two clients: disnix-

client, a loopback interface that directly connects to the core service through

D-Bus; and disnix-ssh-client, which connects to a remote machine by using a

SSH connection (which is the default option).

The separate DisnixWebService package includes an additional interface called

disnix-soap-client, which uses the SOAP protocol to invoke remote operations

and perfor ms ﬁle transfers by using MTOM, an XML parser extension to in-

clude binary attachments without overhead.

A custom interface can be trivially implemented by writing a process that

conforms to a standard interface, calling the custom RPC wrapper of the Dis-

nix service.

4.4.3 Libraries

Many of the tools in the Disnix toolset require access to information stored

in models. Since this functionality is shared across several tools we imple-

mented access to these models through libraries. The libmanifest component

provides functions to access properties deﬁned in the manifest XML ﬁle, li-

binfrastructure provides functions to access to properties deﬁned in the infras-

tructure model and libdistderivation provides functions to access properties in a

distributed derivation ﬁle.

We rather use library interfaces for these models, since they are only ac-

cessed by tools implemented in the C programming language and do not

have to be invoked directly by end users.

Figure 4.18 An image produced by disnix-visualize, showing a deployment scenar io

of SDS2 services across two machines

4.4.4 Additional tools

Apart from disnix-env and the tools that compose it, the Disnix toolset includes

several other utilities that may help a user in the deployment process of a

system:

• disnix-query. Shows the installed services on every machine deﬁned in

the infrastructure model.

• disnix-collect-garbage. Runs the garbage collector in parallel on every ma-

chine deﬁned in the infrastructure model to safely remove components

that are no longer in use.

• disnix-visualise. Generates a clustered graph from a manifest ﬁle to visu-

alise services, their inter-dependencies and the machines to which they

are distributed. The dot tool [Graphviz, 2010] can be used to convert the

graph into an image, such as PNG or SVG.

An example of a visualisation is shown in Figure 4.18, demonstrating

a deployment scenario of SDS2 in which services are distributed across

two machines. As you may notice, the architecture described earlier in

Figure 4.4, already looks complicated and as a result dividing services

across machines results in a signiﬁcant increase of complexity.

• disnix-gendist-roundrobin. Generates a distribution model from a services

and infrastructure model, by applying the round robin scheduling method

to divide services over each machine in equal portions and in circular

order.

4.5 E X P E R I E N C E

We have modelled all the SDS2 components (databases, web services, batch

processes and web application front-ends) to automatically deploy the plat-

form in a small network of consisting of four 64-bit and 32-bit Linux machines.

In addition to SDS2, we have also developed a collection of publicly avail-

able example cases, demonstrating the capabilities of Disnix. These examples

also include source code. Currently, the following public examples are avail-

able:

• StaffTracker (PHP/MySQL version), a distributed system composed of sev-

eral MySQL databases and a PHP web-application front-end.

Chapter 4. Distributed Deployment of Service-Oriented Systems 91

• StaffTracker (Web services version). A more complex variant of the pre-

vious example implemented in Java, in which web services provide in-

terfaces to the databases. This is the example system we have used

throughout the WASDeTT and SCP papers [van der Burg and Dolstra,

2010d].

• StaffTracker (WCF version). An experimental port of the previous web

services example to .NET technology using C# as an implementation

language, Windows Communication Foundation (WCF) as underlying

technology for web services. This example is supposed to be deployed

in an environment using Microsoft IIS and SQL server as infrastruc-

ture components. More details about .NET deployment can be found in

Chapter 12.

• ViewVC. An example case using ViewVC (http://viewvc.tigris.

org), a web-based CVS and Subversion repository viewer implemented

in Python. The Disnix models can be used to automatically deploy

ViewVC, a MySQL database storing commit records, and a number Sub-

version repositories.

• Hello World example. A trivial example case demonstrating various ways

to compose web services. In this example, a basic composition is demon-

strated, an alternative composition by changing an inter-dependency,

and two advanced compositions, using a lookup service and load bal-

ancer, which dynamically compose services.

• Disnix proxy example. A trivial example case demonstrating the use of

a TCP proxy which temporarily queue network connections during the

transition phase. This is the example we used in one of our previous pa-

pers about atomic upgrading [van der Burg et al., 2008]. Furthermore,

this is the most trivial example case using self-hosted service compo-

nents, which do not require any infrastructure components.

• NixOS conﬁgurations. In this example, we show that we can also describe

complete NixOS systems as services and that we can upgrade complete

machine conﬁgurations in the network.

4.6 R E S U LT S

The initial deployment process (building the source code of all SDS2 com-

ponents, transferring components and its dependencies and activating the

services) took about 15 minutes. At Philips, this previously took hours in the

semi-manual deployment process on a single machine. (Manually deploying

an additional machine took almost the same amount of time and was usually

too much effort.)

We were also able to upgrade (i.e. replacing and moving services) a run-

ning SDS2 system and to perform rollbacks. In this process, only the changed

services were updated and deactivated/activated in the right order, which

only took a couple of seconds in most cases. In all the scenarios we tested

no service ran into an inconsistent state due to breaking inter-dependency

connections.

The only minor issue we ran into was that during the upgrade phase, the

web application in the user’s browser (which is not under Disnix’ control) was

not refreshed, which may break certain features. However, we have reduced

the time window in which the deployed system is inconsistent to a minimum

– the installation and upgrading of packages can be completely done without

affecting a previous system conﬁguration at all, which is hard (or sometimes

impossible) to do with conventional deployment systems. Moreover, the tran-

sition phase – in which services are deactivated and activated – is in most

cases only a few seconds. Chapter 8 describes a concept and prototype ex-

ample that can be used to hide this small side-effect, to allow fully atomic

upgrades.

Although Disnix can activate databases, we currently cannot migrate them

to other machines dynamically. Disnix activates a database by using a static

representation (i.e. dump) but is not able to capture and transfer the state of a

deployed database. In Chapter 9 we have developed a prototype replacement

for the Disnix activation scripts module capable of migrating state.

Disnix does not deploy infrastructure components, however. Although Java

web services, web applications and many other types of components can

be built and deployed automatically, infrastructure components such as the

Apache Tomcat application server or a MySQL database server must be de-

ployed by other means.

In the next chapter, we quantify deployment times of certain events and we

present results on the additional examples cases.

4.7 R E L AT E D W O R K

There is quite an amount of work on deployment of services in distributed

environments, which we can categorise in various groups.

4.7.1 Component-speciﬁc

A number of approaches limit themselves to speciﬁc types of components,

such as the BARK reconﬁguration tool [Rutherford et al., 2002], which only

supports the software deployment life-cycle of EJBs and supports atomic up-

grading by changing the resource attached to a JNDI identiﬁer.

Akkerman et al. [2005] have implemented a custom infrastructure on top

of the JBoss application server to automatically deploy Java EE components.

In their approach, they use a component description language to model com-

ponents and their connectivity and an ADL for modelling the infrastructure,

to automatically deploy Java EE components to the right application servers

capable of hosting the given component. The goals of these models are similar

to the Disnix services and infrastructure models; however, they are designed

speciﬁcally for Java EE environments. Moreover, the underlying deployment

Chapter 4. Distributed Deployment of Service-Oriented Systems 93

tooling do not have capabilities such as storing components safely in isolation

and performing atomic upgrades.

In [Mikic-Rakic and Medvidovic, 2002] an embedded software architecture

is described that supports upgrading parts of the system as well as having

multiple versions of a component next to each other. A disadvantage is that

the underlying technology, such as the infrastructure and run-time system,

cannot be upgraded and that the deployment system depends on the technol-

ogy used to implement the system.

Other approaches use component models and language extensions, such

as [Chen and Simons, 2002] which proposes a conceptual component frame-

work for dynamic conﬁguration of distributed systems. In [Costanza, 2002]

the authors developed the GILGUL extension to the Java language for dy-

namic object replacement. Using such approaches requires the developers or

system administrators to use a particular framework or language.

4.7.2 Environment speciﬁc

Approaches for speciﬁc environments also exist. GoDIET [Caron et al., 2006]

is a utility for deployment of the DIET grid computing platform. It writes

conﬁguration ﬁles, stages the ﬁles to remote resources, provides an appropri-

ately ordered and timed launch of components, and supports management

and tear-down of the distributed platform. In [Laszewski et al., 2002] the

Globus toolkit is described for performing three types of deployment scenar-

ios using predeﬁned types of components compositions in a grid computing

environment. CODEWAN [Roussain and Guidec, 2005] is a Java-based plat-

form designed for deployment of component-based applications in ad-hoc

networks.

4.7.3 General approaches

Several generic approaches have been developed. The Software Dock is a dis-

tributed, agent-based deployment framework to support ongoing cooperation

and negotiation among software producers and consumers [Hall et al., 1999].

It emphasises itself on the delivery process of components from producer to

consumer site, rather than complete deployment. In [Dumitras et al., 2007]

a dependency-agnostic upgrade concept with running distributed systems is

described in which the old and new conﬁgurations of a distributed system are

deployed next to each other in isolated runtime environments. Separate mid-

dleware forwards requests to the new conﬁguration at a certain point in time.

TACOMA [Sudmann and Johansen, 2002] uses agents to perform deployment

steps and is built around RPM. This approach has various limitations, such as

the inability to perform atomic upgrades or safely install multiple variants of

components next to each other. Disnix overcomes these limitations.

In [Eilam et al., 2006], a model-driven application deployment architecture

is described, containing various models capturing separate concerns in a de-

ployment process, such as the application structure, network topology of the

data center, and transformation rules. These models are provided by various

stake holders involved in the deployment process of a distributed application

in data centers, such as as developers, domain experts and data center op-

erators. These models are transformed and optimised by various tools into

a concrete deployment plan which can be used to perform the actual de-

ployment process, taking various functional requirements and non-functional

requirements into account. These techniques are applied in various contexts,

for example to automatically compose services in a cloud environment [Kon-

stantinou et al., 2009] and can be bridged to various tools to perform the

deployment and provisioning [El Maghraoui et al., 2006].

The models used by Disnix in this chapter are merely executable speciﬁca-

tions used to perform the actual deployment process and not designed for de-

ployment planning. Moreover, the semantics of the Nix expression language

that Disnix uses are close to the underlying purely functional deployment

model. However, in the next chapter, we have implemented an extension on

top of Disnix, providing additional models to specify a policy of mapping ser-

vices to machines in the network based on functional properties deﬁned in the

service and infrastructure models. Also, various deployment planning algo-

rithms described in the literature can be used for this pur pose. In conjunction

with a discovery service service-oriented systems can be made self-adaptable.

It could also be possible to integrate Disnix components into the architecture

of [Eilam et al., 2006] so that their deployment planning techniques can be

used and our deployment mechanics to make a software deployment process

reliable.

The Engage system [Fischer et al., 2012] has a big conceptual overleap

with Disnix. Engage also allows components and infrastructure to be mod-

elled as separate entities from a high-level perspective and makes a distinc-

tion between components residing on the same machine (local dependencies),

within a container (inside dependencies) and dependencies between compo-

nents that may reside on different machines in the network (peer dependen-

cies). A major difference between Engage and Disnix is that Engage uses a

constraint-based algorithm to resolve dependencies, whereas Disnix uses the

deployment model of Nix in which services are constructed and conﬁgured

by function calls. Moreover, Engage uses an imperative deployment facility to

perform the actual deployment steps.

Another notable aspect that sets Disnix apart from other deployment tools,

is that every service component must be managed through the Nix store (al-

though there are ways to refer to components on the host system, which is

not recommended). Disnix cannot be used to manage the deployment of ex-

isting service components deployed by other means. If we would be able to

refer to components outside this model, we lose all the advantages that the

purely functional deployment model of Nix offers, such as the fact that we can

uniquely distinguish between variants of components, store them safely next

to each other and safely and (nearly) atomically upgrade components. If the

DisnixOS extension is used (described later in Chapter 6), also the underlying

infrastructure components are managed by Nix and thus cannot be used to

Chapter 4. Distributed Deployment of Service-Oriented Systems 95

manage infrastructure components deployed by other means. In some cases,

upgrades with Disnix can be more expensive than other deployment tools,

since everything has to managed by the Nix package manager and compo-

nents are bound to each other statically, but in return the Nix deployment

model provides very powerful reliability aspects.

4.7.4 Common practice

In practice many software deployment processes are partially automated by

manually composing several utilities together in scripts and use those to per-

form tasks. While this gives users some kind of speciﬁcation and reproducibil-

ity, it cannot ensure properties such as correct deployment.

4.8 C O N C L U S I O N

We have shown Disnix, a distributed deployment extension to Nix, used to

automatically deploy, upgrade and roll back a service-oriented system such

as SDS2 consisting of components of various types in a network of machines

running on different platforms. Our experiments have shown that we can

automatically and reliably deploy the application components of SDS2 and

that the automated process yields a signiﬁcant improvement in deployment

time, compared to the original semi-manual deployment process. In addi-

tion to SDS2, we have also implemented a collection of example systems im-

plemented in various programming languages and using various component

technologies.

Because Disnix is extensible, takes inter-dependencies into account, and

builds on the purely functional properties of Nix, we can safely upgrade only

necessary parts, and deactivate and activate the required services of a system,

making the upgrade process efﬁcient and reliable.

In Chapter 6 we show DisnixOS, an extension to Disnix which can deploy

infrastructure components next to service components. Chapter 9 illustrates

Dysnomia, a proof of concept tool providing automated deployment of mu-

table state of containers (called mutable components).

Self-Adaptive Deployment of

Service-Oriented Systems

A B S T R A C T

In addition to the deployment challenges that we have described in the pre-

vious chapter, usually the environment in which distributed systems are de-

ployed is dynamic: any machine in the network may crash, network links may

temporarily fail, and so on. Such events may render the system partially or

completely unusable. If an event occurs, it is difﬁcult and expensive to re-

deploy the system to the take the new circumstances into account. In this

chapter, we present a self-adaptive deployment framework built on top of

Disnix. This framework dynamically discovers machines in the network and

generates a mapping of components to machines based on non-functional

properties. Disnix is then invoked to automatically, reliably and efﬁciently

redeploy the system.

5.1 I N T R O D U C T I O N

As explained in the previous chapter, service-oriented systems are composed

of distributable services, such as web services, databases, web applications

and batch processes, working together to achieve a common goal. Service-

oriented systems are usually deployed in networks of machines (such as a

cloud environment) having various capabilities and properties, because there

are various technical and non-functional requirements that must be met. For

example, a machine assigned for storage may not be suitable for running a

component doing complex calculations or image rendering. We also want to

deploy services providing access to privacy-sensitive information on a ma-

chine with restricted access.

After a service-oriented system is deployed, it is not guaranteed that it

will stay operational. In networks, various events may occur. A machine may

crash and disappear from the network, or a new machine may be added.

Moreover, a capability of a machine could change, such as an increase of

memory. Some of these events may partially or completely break a system

or make the current deployment suboptimal, in which certain non-functional

deployment requirements are no longer met.

In order to take new circumstances into account after an event, a system

must be redeployed. As we have seen, deployment processes of ser vice-oriented

systems are usually difﬁcult and expensive. For instance, they may require

system administrators to perform manual deployment steps, such as installing

components or editing conﬁguration ﬁles. In many cases, it is a lot of effort to

build components, transfer the components and all dependencies to the right

machines and to activate the system. Upgrading is even more difﬁcult as this

may completely break the system. Furthermore, ﬁnding a suitable mapping of

services to machines is a non-trivial process, since non-functional properties

of the domain (such as quality-of-service requirements) must be supported.

These factors cause a large degree of inﬂexibility in reacting to events.

Even though Disnix can reliably and conveniently automate a deployment

process of a service-oriented system, it requires speciﬁcations which must

be written manually by developers or system administrators. For example,

in order to react to an event, a new distribution model must be provided

satisfying all required technical and non-functional deployment requirements,

which is a labourious and tedious process.

In this Chapter, we describe a self-adaptive deployment extension built on

top of Disnix, the distributed deployment tool for service-oriented systems

described in the previous chapter. This framework continuously discovers

machines and their capabilities in a network. In case of an event, the system

is automatically, reliably and efﬁciently redeployed to take the new circum-

stances into account. A quality-of-service (QoS) policy model can be speciﬁed

to ﬁnd a suitable mapping of services to machines dealing with desirable

deployment constraints from the domain. We have applied this deployment

extension to several case studies, including SDS2, the industrial case study

from Philips Research covered in the previous chapter.

In contrast to other self-adaptive approaches dealing with events in dis-

tributed environments, Disnix redeploys a system instead of changing proper-

ties of already deployed components, such as its behaviour and connections.

The advantages of this approach are that various types of services are sup-

ported (which may not all have self-adaptive properties) and that we do not

require components to use a speciﬁc component technology to make them

self-adaptive.

5.2 M O T I VAT I O N

We have applied the dynamic Disnix extension to several use cases, includ-

ing two custom developed systems, an open-source case study, and SDS2,

the industrial case study covered in the previous chapter. Each case-study

is implemented using various technologies, serving different goals and have

different requirements. We will use these throughout this chapter to motivate

and explain our work.

5.2.1 StaffTracker (PHP/MySQL version)

First, we present two toy systems that represent two different styles of dis-

tributed systems. Both implement the same application: allowing users to

store and query information about staff members of a university, such as

names, IP addresses and room numbers. From a room number of a staff mem-

ber, a zip code can be determined; from a zip code, the system can determine

Figure 5.1 Architecture of the StaffTracker (PHP/MySQL version)

the street and city in which the staff member is located. The ﬁrst variant uses

a web interface written in PHP that accesses data stored in different MySQL

databases.

Figure 5.1 shows the architecture of this variant of the StaffTracker system.

Each object in the ﬁgure denotes a distributable component, capable of run-

ning on a separate machine in the network. The arrows denote an inter-

dependency relationship, which are dependencies between distributable com-

ponents. The StaffTracker system is composed of various MySQL databases

(denoted by a database icon) storing a dataset, and a web application front-

end implemented in PHP (denoted by a rectangle) allowing end users to query

and modify staff members.

In order to deploy this system, several technical requirements must be met.

First, all MySQL databases must be deployed on a machine running a MySQL

DBMS. Secondly, the web application front-end must be deployed on a ma-

chine running an Apache web server instance.

Various events may occur in a network of machines running this system. A

machine may crash, making it impossible to access a speciﬁc dataset. More-

over, if the machine running the web application front-end crashes, the sys-

tem becomes completely inaccessible. It is also possible that a new machine

is added, such as a database server, which may offer additional disk space. In

such cases, the system must be redeployed to take the new circumstances into

account.

To redeploy this system, a system administrator must install databases on a

MySQL server or web applications on an Apache web server. Moreover, he has

to adapt the conﬁguration of the web application front-end, so that a database

from a different server is used. Furthermore, these steps must be performed

in the right order. If, for instance, the system administrator activates the web

application front-end, before all databases are activated speciﬁc features may

be inaccessible to end users.

5.2.2 StaffTracker (Web services version)

The second variant, shown in Figure 5.2, represents a SOA-style system based

on separate web services that each provide access to a different database. Ad-

ditionally, it provides a GeolocationService web service using GeoIP [MaxMind,

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 99

Figure 5.2 Architecture of the StaffTracker (Web services version)

2010] to determine the location of a staff members, based on his or her IP

address. The web services and web application components of this variant of

the StaffTracker are implemented in Java and require Apache Tomcat

for web

application hosting and management.

In order to deploy this system correctly, we must deploy MySQL databases

on machines running a MySQL DBMS (similar to the previous use case), but

we must also deploy web service and web application components on ma-

chines running Apache Tomcat.

In case of an event, redeployment takes even more effort than the previous

example. If the location of a database or web service changes, a system ad-

ministrator has to adapt all the conﬁguration ﬁles of all dependent services.

Moreover, the system administrator has to repackage the web services and

web applications, transfer them to the right machines in the network and de-

activate and activate them in the right order (otherwise features may become

inaccessible).

5.2.3 ViewVC

ViewVC

is a free and open-source web-based Subversion and CVS repository

viewer implemented in Python. ViewVC is able to connect to remote Subver-

sion repositories in a network. It can optionally use a MySQL database to

query commit records used for development analysis tasks. The architecture

is shown in Figure 5.3.

As in the previous examples, we must ensure that the web application

front-end is deployed on a machine running an Apache web server and the

database on a machine running a MySQL DBMS. We must also ensure that

Subversion repositories are deployed on machines running svnserve, which

provides remote access to a Subversion repositor y through the SVN protocol.

http://tomcat.apache.org

http://viewvc.tigris.org

100

Figure 5.3 Architecture of ViewVC

5.2.4 SDS2

SDS2 [de Jonge et al., 2007] is the industrial case study we have used as main

motivation in the previous chapter. As explained earlier, medical devices

generate status and location events and SDS2 is a service-oriented architecture

Apart from the technical requirements for deployment, such as mapping

services with a speciﬁc type to machines capable of running them, there are

also non-functional requirements which must be supported. For example, the

datasets containing events logs must be stored inside a hospital environment

for performance and privacy reasons. The services generating workﬂow anal-

ysis results and summaries must be deployed inside the Philips enterprise

environment, since Philips has a more powerful infrastructure and may want

to use generated data for other purposes.

In order to redeploy SDS2, nearly every service must be adapted (due to the

fact that every service uses a global conﬁguration ﬁle containing all locations

of every service), repackaged, transferred to the right machines in the network

and deactivated and activated in the right order (a process which normally

takes several hours to perform manually).

5.3 A N N O TAT I N G D I S N I X M O D E L S

As explained in the previous chapter, we have developed Disnix, a distributed

deployment tool built on top of the Nix package manager to deal with the

complexity of the deployment process of service-oriented systems. Disnix

offers various advantages compared to related tooling.

We have also explained that Disnix deployment processes are automated

by three models; the services model specifying of which services a system is

composed and how they are connected, the infrastructure model capturing the

available machines in the network and their relevant properties/capabilities

and the distribution model mapping services to machines. In order to dynami-

cally redeploy systems, some of these models must be annotated with relevant

technical and non-functional properties.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 101

{distribution, pkgs, system}:

let

customPkgs = import ../top-level/all-packages.nix {

inherit pkgs distribution system;

};

rec {

mobileeventlogs = {

name = "mobileeventlogs";

pkg = customPkgs.mobileeventlogs;

type = "mysql-database";

stateful = true;

requiredZone = "hospital";

};

MELogService = {

name = "MELogService";

pkg = customPkgs.MELogService;

dependsOn = {

inherit mobileeventlogs;

};

type = "tomcat-webapplication";

requiredZone = "hospital";

};

SDS2AssetTracker = {

name = "SDS2AssetTracker";

pkg = customPkgs.SDS2AssetTracker;

dependsOn = {

inherit MELogService ...;

};

type = "tomcat-webapplication";

requiredZone = "philips";

};

...

}

Figure 5.4 Annotated services.nix services model for SDS2

5.3.1 Services model

Figure 5.4 shows an annotated services model for SDS2. This model is almost

entirely the same as the services model shown in the previous chapter. The

model captures the same properties, such as the available services, how to

build them, their inter-dependencies and their types. Additionally, each serv-

ice is augmented with custom attributes, describing miscellaneous technical

properties and non-functional properties:

82 The requiredZone attribute is a custom attribute, which we have used to

specify that a component must be deployed in a particular zone. In this

case, we have speciﬁed that the mobileeventlogs service, must be deployed

on a machine physically located hospital environment. This is a very

important requirement, which we have also outlined in the previous

chapter.

83 Here, the requiredZone attribute speciﬁes that the SDS2AssetTracker service

should be deployed inside a Philips data center.

102

{

test1 = {

hostname = "test1.net";

tomcatPort = 8080;

mysqlUser = "user";

mysqlPassword = "secret";

mysqlPort = 3306;

targetEPR = http://test1.net/.../DisnixService;

system = "i686-linux";

supportedTypes = [

"process" "tomcat-webapplication" "mysql-databases" "ejabberd-databases"

];

zone = "hospital"; 85

};

test2 = {

hostname = "test2.net";

tomcatPort = 8080;

...

targetEPR = http://test2.net/.../DisnixService;

system = "x86_64-linux";

supportedTypes = [ "process" "tomcat-webapplication" ];

zone = "philips";

};

}

Figure 5.5 An annotated infrastructure.nix model for SDS2

These custom attributes are not directly used by the basic Disnix toolset for

deployment, but can be used for purposes such as expressing QoS require-

ments and to dynamically deploy ser vices, as we will see in Section 5.5.

5.3.2 Infrastructure model

Likewise, infrastructure models can also be annotated with custom properties

capturing technical and non-functional properties. In Figure 5.5, we have

annotated the SDS2 infrastructure model with two custom properties:

84 The supportedTypes attribute is a list specifying what types of services

can be deployed on the given machine. For example, the test1 machine

is capable of running generic UNIX processes, Apache Tomcat web ap-

plications, MySQL databases and ejabberd databases.

85 The zone attribute speciﬁes in which zone the machine is located. In our

example, test1 is located in the hospital environment and test2 is located

in the Philips environment.

5.4 S E L F - A D A P T I V E D E P L O Y M E N T

As we have explained in the previous chapter, Disnix is a good basis for

distributed deployment, because it can safely install components without in-

terference and reliably install or upgrade a distributed system. Moreover, it

makes the time of the transition phase (in which the system is inconsistent)

as small as possible. All these features signiﬁcantly reduce the redeployment

effort of a system.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 103

Figure 5.6 A self-adaptive deployment architecture for Disnix

However, Disnix requires an infrastructure model which is speciﬁed man-

ually by a developer or system administrator. For large networks in which

systems may disappear or added or properties may change, this is infeasible

to maintain. Moreover, for each event in a network, a new distribution model

must be speciﬁed. Not every machine may be capable of running a particular

service deﬁned in the service model. Manually specifying a new suitable dis-

tribution taking certain non-functional requirements into account is usually

difﬁcult and expensive.

In order to react automatically to events occurring in the network, we have

extended the Disnix toolset by providing automatic generation of the infras-

tructure and distribution models.

Figure 5.6 shows an extended architecture of Disnix to support self-adaptive

deployment. Rectangles represent objects that are manipulated by the tools.

The ovals represent a tool from the toolset performing a step in the deploy-

ment process. In this architecture several new tools are visible:

• The infrastructure generator uses a discovery service to detect machines

in the network and their capabilities and properties. An infrastructure

model is generated from these discovered properties.

• The infrastructure augmenter adds additional infrastructure properties to

the generated infrastructure model that are not announced by the ma-

chines. (Some properties are not desirable to announce because they are

privacy-sensitive, such as authentication credentials).

104

• The distribution generator maps services in the services model to targets

deﬁned in the generated and augmented infrastructure model. A quality

of ser vice (QoS) model is used to specify a policy describing how this

mapping should be performed, based on quality attributes speciﬁed in

the services and infrastructure model.

In this framework, a user provides three models as input parameters:

• The services model, as we have seen in Section 5.3.1, captures all the

services constituting the system.

• The augmentation model is used to add infrastructure properties to the

infrastructure model, which cannot be provided by the discovery serv-

ice, such as authentication credentials.

• The quality-of-service model (QoS) is used to describe how services can

be distributed to target machines, based on QoS properties deﬁned in

the services and infrastructure models.

After the infrastructure and distribution models have been generated, they

are passed to the disnix-env tool along with the services model. The disnix-

env tool then performs the actual deployment and the system is upgraded

according to the new distribution. By continuously executing this system

when events occur, a system becomes self-adaptive.

5.4.1 Infrastructure generator

We have implemented an infrastructure generator based on Avahi, a multicast

DNS (mDNS) and DNS service discovery (DNS-SD) implementation [Freedesk-

top Project, 2010a]. We have chosen Avahi for practical reasons, because it is

light-weight and included in many modern Linux distributions. Although we

only have a DNS-SD-based infrastructure generator, the architecture of the

self-adaptive deployment extension is not limited to a particular discovery

protocol. The infrastructure generator is a process that can be relatively easily

replaced by a generator using a different discovery protocol, such as SSDP.

Every target machine in the network multicasts properties and capabilities

in DNS records through DNS-SD. Some infrastructure properties are pub-

lished by default, such as the hostname which is required for the coordinator

machine to perform remote deployment steps, system specifying the operat-

ing system and architecture, and supportedTypes which speciﬁes the types of

services which can be activated on the machine, such as mysql-database and

tomcat-webapplication. The list of supported types is conﬁgured by the installa-

tion of the DisnixService, which, for instance, provides a mysql-database activa-

tion script if the conﬁguration script detects a MySQL instance running on the

target machine. Moreover, some hardware capabilities are published, such as

mem which publishes the amount of RAM available. A system administrator

can also publish custom properties through the same service as DNS records,

such as zone representing the location in which the machine is based.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 105

{infrastructure, lib}: 86

lib.mapAttrs 87

(target:

if target ? mysqlPort 88

then target // {

mysqlUsername ="foo"; mysqlPassword = "secret"; 89

}

else target

) infrastructure

Figure 5.7 augment.nix: A model augmenting the infrastructure model

The infrastructure generator captures the DNS-SD properties announced

by the target machines and uses it to generate an infrastructure model. The

generation step is quite trivial, because DNS records consist of key = value

pairs that map onto Nix attribute sets in a straight-forward manner.

5.4.2 Infrastructure augmenter

As mentioned before, it may not be desirable to publish all infrastructure

properties through a discovery service, such as authentication credentials re-

quired to deploy a MySQL database. Therefore, the generated infrastructure

model may need to be augmented with additional infrastructure properties.

Figure 5.7 shows an example of an augmentation model:

86 The augmentation model is a function taking two arguments. The infras-

tructure parameter represents the infrastructure model generated by the

discovery service. The lib parameter refers to an attribute set of library

functions, provided by Nixpkgs. The remainder of the model modiﬁes

the attribute set containing infrastructure properties.

87 In the remainder of the augmentation model, we call the lib.mapAttrs func-

tion to iterate over each target attribute in the infrastructure model at-

tribute set.

88 Here, we have deﬁned that if a MySQL port is deﬁned in a machine

attribute set, we augment the MySQL authentication credentials 89 so

that a database can be deployed.

The strategy of augmenting additional infrastructure properties depends

on the policy used for the infrastructure and the domain in which the system

is used. In this example, we assume that every MySQL server uses the same

authentication credentials. There may be situations in which each individual

machine requires a separate username and password or perhaps groups of

machines. Other properties and transformations can be speciﬁed in this model

by using Nix expression language constructs and functions deﬁned in the lib

attribute set.

106

{ services, infrastructure

, initialDistribution, previousDistribution, filters}: 90

filters.filterB {

inherit services infrastructure;

...

distribution = filters.filterA {

inherit services infrastructure;

distribution = initialDistribution;

...

};

}

Figure 5.8 qos.nix: A quality-of-service model implementing a policy

5.4.3 Distribution generator

Given the automatically discovered infrastructure model, Disnix must now

generate an distribution model that maps services onto machines. Automat-

ically generating a suitable distribution is not a trivial task. In some cases

it is difﬁcult to specify what is an optimal distribution, because optimising

a particular attribute may reduce the quality of another important quality

attribute, a phenomenon called Pareto optimality in multidimensional optimi-

sation [Medvidovic and Malek, 2007; Malek et al., 2012]. Moreover, in some

cases ﬁnding a suitable distribution are NP-hard problems for which no algo-

rithm that can solve it in polynomial time is known, which makes it infeasible

to ﬁnd an optimal distribution for large systems.

Furthermore, in many cases it is desired to implement constraints from the

domain, which must be reﬂected in the distribution of services to machines.

These constraints are largely context-speciﬁc. For example, constraints in the

medical domain (such as privacy) are not always as important in other do-

mains in which performance may be much more important. Also, the notion

of certain quality attributes may be different in each domain.

Because of all these issues, there is no silver bullet that always generates a

suitable distribution for a service-oriented system. Therefore, a developer or

system administrator must specify a suitable deployment strategy in a quality

of service model to deal with non-functional requirements from the domain.

The Disnix toolset contains various functions that assist users with specifying

deployment strategies. Apart from a number of included functions, custom

algorithms and tooling can be easily integrated into the toolset to support

desirable distribution strategies.

A distribution model can be generated from a service and infrastructure

model by using a quality-of-service model in which developers or system ad-

ministrators can combine ﬁlter functions that determine to which machine a

service can be deployed, by taking certain technical and non-functional con-

straints of services and machines into account.

The general structure of a quality-of-service model is shown in Figure 5.8:

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 107

90 A QoS model is a function taking several arguments. The services argu-

ment represents the services model and the infrastructure argument rep-

resents the infrastructure model. The initialDistribution represents a dis-

tribution model which maps services to candidate targets. Normally,

every target deﬁned in the infrastructure model is considered a candi-

date host for every service. The previousDistribution argument represents

the distribution model of the previous deployment (or null in case of an

initial deployment). This attribute set is derived by transforming the

last manifest ﬁle stored in the Nix coordinator proﬁle to a distribution

expression. This attribute set can be used to reﬂect about the upgrade

effort or to keep speciﬁc services on the same machines. The ﬁlters at-

tribute represents an attribute set containing various functions, which

ﬁlter and divide services deﬁned in the initial distribution model.

91 In the body of the model, a developer must ﬁlter and transform the

initialDistribution attribute set into a suitable distribution, by applying vari-

ous functions deﬁned in ﬁlters. Every function takes a distribution model

as input argument and optionally uses the services and infrastructure

models. The result of a function is a new distribution model taking the

effects of the ﬁlter into account. The result can be passed to another ﬁlter

function or used for the actual deployment. The process of composing

ﬁlter functions is repeated until a desirable distribution is achieved.

5.5 I M P L E M E N T I N G A Q U A L I T Y O F S E RV I C E M O D E L

The QoS policy speciﬁcations in Disnix consist of two parts, which are in-

spired by [Heydarnoori et al., 2006; Heydarnoori and Mavaddat, 2006]. First,

a candidate host selection phase is performed, in which for each service suitable

candidate hosts are determined. This phase sets the deployment boundaries

for each service. After a suitable candidate host selection is generated, the

division phase is performed in which services are allocated to the candidate

hosts. The dynamic deployment framework offers various functions deﬁned

in the Nix expression language that can be composed to generate a desirable

distribution strategy dealing with non-functional properties from the domain.

5.5.1 Candidate host selection phase

Before we can reliably assign services to machines in the network, we must

ﬁrst know on which machines a service is deployable. There are many con-

straints which render a service undeployable on a particular machine. First,

there are technical constraints. For example, in order to deploy a MySQL

database on a machine in the network, a running MySQL DBMS is required.

Moreover, a component may also have other technical limitations, e.g. if a

component is designed for Linux-based systems it may not be deployable on

a different operating system, such as Microsoft Windows.

Second, there are non-functional constraints from the domain. For instance,

in the medical domain, some data repositories must be stored within the phys-

108

{ services, infrastructure

, initialDistribution, previousDistribution, filters

filters.mapAttrOnAttr {

inherit services infrastructure;

distribution = initialDistribution; 93

serviceProperty = "requiredZone"; infrastructureProperty = "zone"; 94

}

Figure 5.9 Candidate host selection using mapAttrOnAttr

ical borders of a speciﬁc country because this is required by the law of that

particular country. In order to ﬁlter a distribution, various mapping functions

are available.

One-to-One mapping

The function mapAttrOnAttr can be used to map a value of an attribute deﬁned

in the services model onto a value of an attribute deﬁned in the infrastructure

model.

Figure 5.9 shows an example application of this function, which creates

a candidate host selection for the service model deﬁned in Figure 5.4 and

infrastructure model deﬁned in Figure 5.5 ensuring that a service that must

be deployed in a particular zone is mapped onto machines located in that

zone:

92 The mapAttrOnAttr function requires the services and infrastructure mod-

els, which can be provided without any modiﬁcations.

93 The distribution model that we want to ﬁlter is the initial distribution

model, which considers every machine in the infrastructure model as a

potential candidate for every service.

94 The arguments on this line specify that the requiredZone attribute deﬁned

in the services model must be mapped onto the zone attribute in the

infrastructure model.

The result of the expression in Figure 5.9 is a new distribution model, in

which every service belonging to a particular zone refers to machines that are

located in that zone.

One-to-Many mapping

The function mapAttrOnList can be used to map a value of a service attribute

onto a value deﬁned in a list of values deﬁned in the infrastructure model.

Figure 5.10 shows how this function can be used to create a candidate host

selection that ensures that a service of a speciﬁc type is mapped to machines

supporting that particular type:

95 This line speciﬁes that the type attribute in the ser vices model is mapped

onto one of the values in the supportedTypes list.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 109

{ services, infrastructure

, initialDistribution, previousDistribution, filters

filters.mapAttrOnList {

inherit services infrastructure;

distribution = initialDistribution;

serviceProperty = "type"; infrastructureProperty = "supportedTypes"; 95

}

Figure 5.10 Candidate host selection using attrOnList

{ services, infrastructure

, initialDistribution, previousDistribution, filters

filters.mapStatefulToPrevious {

inherit services;

distribution = initialDistribution;

inherit previousDistribution;

}

Figure 5.11 Candidate host selection using mapStatefulToPrevious

The expression described in Figure 5.10 is very valuable for all our example

cases. This function ensures that a MySQL database is deployed on a machine

running a MySQL DBMS, a Apache Tomcat web application is deployed on a

server running Apache Tomcat and that other ser vices of a speciﬁc type are

deployed on the right machines.

Many-to-One mapping

Similarly, a function mapListOnAttr is provided to map a list of values in the

service model onto an attribute value deﬁned in the infrastructure model.

Restrict stateful components

Another issue is that ser vices of a system may be stateful. For instance, data

in a MySQL database may change after it has been deployed. Disnix does

not migrate these changes to another machine in case of a redeployment.

Furthermore, migrating very large data sets can be very expensive.

By deﬁning the property stateful = true in the services model (for example,

for the mobileeventlogs service in Figure 5.4), a developer can specify that a

service has mutable state.

By using the mapStatefulToPrevious function, as shown in Figure 5.11, a de-

veloper can specify that in case of an upgrade, the stateful service is mapped

only to the hosts of the previous deployment scenario. This ensures that a

service such as a MySQL database is not moved and the data is preser ved.

Filter unbuildable components

There is also a brute force ﬁlterBuildable function available, which tries to in-

stantiate every derivation function of a service on every machine type of ma-

chine in the infrastructure model and checks whether there are no exceptions

110

thrown. As a consequence, unbuildable functions are automatically ﬁltered

out.

Although this function can be very useful in some cases, it also quite time

consuming to evaluate all build functions of a system. Therefore, it is not

enabled by default. Preferably, users should use other strategies to determine

whether a target is a suitable candidate for a service.

5.5.2 Division phase

After a candidate host selection has been made, the services must be divided

over the candidate hosts by using a suitable strategy, as there are typically still

multiple machines available to host particular services. In the literature, vari-

ous deployment planning algorithms are described that try to ﬁnd a suitable

mapping of services to machines in certain scenarios. However, there is no

algorithm available that always provides the best distribution for a particular

system, nor it is always doable to ﬁnd an optimal distribution. Therefore,

suitable strategies must be selected and conﬁgured by the user.

Interfacing with external tools

Disnix uses the Nix expression language to specify all the models that it is

using. The Nix expression language is a domain speciﬁc language designed

for describing build actions, but not really suitable for implementing deploy-

ment planning algorithms, which often use arithmetic operations which are

poorly supported in the Nix expression language. Therefore, Disnix provides

an interface for integrating external algorithms as separate tooling (because

the Nix expression language is limited for this purpose) and also caches the

result in the Nix store like software components (i.e. if the same algorithm is

executed with the same parameters the result from the Nix store is immedi-

ately returned instead of performing the algorithm again).

This interface basically converts pieces of the services and infrastructure

models into XML. These XML representations can then be transformed into

another model (e.g. by using XSLT) and used by an external tool imple-

menting a particular division algorithm. The output of the process is then

transformed into a Nix expression representing a new distribution model and

used by other functions deﬁned in the quality of service model. A notable

property of the interface is that it also uses Nix to invoke the external division

algorithm the same way as it builds software components. This offers us the

advantage that the result of the division algorithm is stored in the Nix store

(just as ordinary components) and that the result is cached.

A one-dimensional division method

We have implemented a number of division methods. The most trivial method

is a one-dimensional division generator, which uses a single variable from the

services and infrastructure models and has three strategies. The greedy strat-

egy iterates for each service over the possible candidate hosts and picks the

ﬁrst target until it is out of system resources, then moves over to the next un-

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 111

til all services are mapped. The highest-bidder strategy picks for each iteration

the target machine with the most system resources available. Similarly, the

lowest-bidder strategy picks the target with the least system resources capable

of running the service.

A cost effective division method

Additionally, we have support for a number of more complex algorithms,

such as an approximation algorithm of the minimum set cover problem [Hey-

darnoori et al., 2006]. This algorithm can be used to approximate the most

effective cost deployment scenario when every server has a ﬁxed price, no

matter how many services it will host.

A reliable division method

Another complex is algorithm is an approximation algorithm for the Multi-

way Cut problem [Heydarnoori and Mavaddat, 2006], which can be used to

approximate a deployment scenario in which the number of network links

between services is minimised to make a system more reliable.

A division method for reliability testing

For system integration testing, it may be useful to apply an approximation

algorithm for vertex coloring, in which every node represents a service and

every color a machine in the infrastructure model. In the resulting distribu-

tion, every service has an inter-dependency relationship, which is a physical

network link to another machine. It can be interesting to terminate such net-

work connections and verify whether the system is capable of recovering from

these network failures. The Dynamic Disnix framework implements the basic

approximation algorithm described in [Brélaz, 1979] for this purpose.

5.6 E VA L U AT I O N

We have evaluated the case studies described in Section 5.2 in a cluster of

Xen virtual machines, each having the same amount of system resources. For

each case study, we have implemented a QoS model to deal with desirable

constraints, and we have used a separate division algorithm for each case.

We ﬁrst performed an initial deployment in which the system is deployed

in a small network of machines. Then we trigger various events, such as

adding a server with speciﬁc capabilities or removing a server from the net-

work. For each event triggering a redeployment, we check whether the fea-

tures of the system are still accessible and measure the times of discovering

the network state, generating the distribution model, building the system,

distributing the services and transiting to (activating) the new conﬁguration.

Table 5.1 summarises the results of the evaluation for the four systems

described in Section 5.2.

112

5.6.1 StaffTracker (PHP/MySQL version)

The ﬁrst is the StaffTracker (PHP/MySQL version). The size of all components

and dependencies of this system is 120 KiB. The quality-of-service model that

we deﬁned for this system maps services of a speciﬁc type to the machines

supporting them (e.g. MySQL database to servers running a MySQL DBMS).

Moreover, the database storing the staff members is marked as stateful and

must not be relocated in case of a redeployment. The highest bidder strategy

is used to divide services over the candidate hosts.

Initially, the system is deployed in a network consisting of an Apache web

server machine and a MySQL machine, which takes quite some time. Then

we ﬁre two events in which we add a machine to the network. Since Disnix

can build and deploy components safely next to other components, this gives

us no downtime. Only the transition phase (which only takes about 1.3-1.7

seconds) introduces some inconsistency, which is very short. Finally, we ﬁre

two remove events. In this case, a new deployment is calculated and the

system conﬁguration is adapted. During these events downtimes are a bit

longer (about 15 seconds in total), because the system must be partially rebuilt

and reconﬁgured, during which time those services are not available.

5.6.2 StaffTracker (Web services version)

Next is the SOA variant of the StaffTracker. The size of all components and

dependencies of this variant is 94 MiB (which is signiﬁcantly larger than the

PHP version). We have used the same candidate host selection strategy as the

previous example and the minimum set cover approximation algorithm for

dividing services over the candidate hosts.

Similar results are observable compared to the previous example. The ini-

tial deployment also takes a signiﬁcant amount of time. Events in which ma-

chines are added only take a couple of seconds. The failure events in this sce-

nario are very cheap and only take about 8 seconds of redeployment time (in-

cluding building, transferring and activating). This event is so cheap because

Disnix does not remove already deployed components from machines, unless

they are explicitly garbage collected. Because after these failure events the

system is essentially the same as in a previous conﬁguration, we can switch

back nearly instantly.

5.6.3 ViewVC

In the ViewVC case, the size of all components and dependencies of the sys-

tem is 360 MiB. We used the same candidate host mapping method as in

the previous examples. We deployed ViewVC and its dependencies into a

network consisting of an Apache web server machine, a MySQL server ma-

chine and a Subversion machine. All Subversion repositories were marked as

stateful, so that they were not relocated in case of an event. For dividing the

services we have used the greedy algorithm to use the maximum amount of

diskspace of a server.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 113

As the table shows, an add event only introduces a couple of seconds of

downtime. Event 3 takes a little longer, because a new Subversion repository

must be copied and transferred to a new Subversion server. Event 4 and 5

trigger a redeployment of the web application front-end and MySQL cache.

5.6.4 SDS2

The ﬁnal evaluation subject is SDS2. For SDS2 we used, apart from the same

candidate host mapping criteria as the previous examples, a strategy that de-

ploys several services into a hospital environment and others in a Philips en-

vironment. The size of all components and dependencies of SDS2 is 599 MiB.

The division method was the multiway cut approximation algorithm, which

tries to ﬁnd a distribution with a minimum number of network links between

services. For the initial deployment, we used 5 machines: 2 Apache Tomcat

servers (in the Philips and hospital environments), 2 MySQL ser vers (in the

Philips and hospital environments) and 1 Ejabberd server (in the hospital en-

vironment). The results of the deployment are comparable to the previous

examples, e.g., a long initial deployment time and a very short redeployment

time in case of a newly added server. The notable exception is the long re-

deployment times in case of a failing server. In these cases SDS2 had to be

completely rebuilt from source (apart for some shared library dependencies).

This is due to the fact that the SDS2 developers included a global conﬁgura-

tion ﬁle containing the locations of all services with each service. In case of

a change in the distribution, all services had to be rebuilt, repackaged and

redeployed on all machines in the network. This problem could have been

avoided by specifying small conﬁguration ﬁles for each service which only

capture settings of its inter-dependencies.

5.6.5 General observations

Another notable general observation is the generation time for the initial de-

ployment step. These values were always close to 28 seconds for all case stud-

ies and much smaller during the redeployment steps. This is due to the fact

that these algorithms are invoked by the same build tooling used for Disnix,

which will ﬁrst download some required dependencies in order to execute

the algorithms.

5.7 D I S C U S S I O N

We have applied Disnix to several use cases and have shown that we can

efﬁciently and reliably redeploy a system in case of an event dealing with

desirable non-functional properties. However, there are some limitations in

using Disnix for self-adaptive deployment of distributed systems.

The ﬁrst issue is applicability. Disnix is designed for the deployment of

service-oriented systems, that is, systems composed of distributable compo-

nents, which are preferably small and relocatable. This requires developers

114

to properly divide a system into distributable components. Some distributed

systems are composed of components that are so tightly integrated into the

environment, that it is infeasible to relocate the component in case of an event.

For instance, the system may require a restart or a complete adaptation of the

system conﬁguration. Another issue is that some distributable components

may completely fail after a disconnection without recovering afterwards. Dis-

nix does not detect this behaviour. Moreover, Disnix uses a centralised ap-

proach, which means that components must be distributed and activated from

a central location, and the complete system must be captured in the models

used by Disnix. This is not possible for all types of distributed systems.

Disnix has only limited support for dealing with mutable state, such as

databases, Subversion repositories, caches and log ﬁles. For services such as

databases and Subversion repositories, Disnix can initialize a database schema

and initial data at startup time, but Disnix does not automatically migrate the

changes in the data made afterwards.

There are a couple of solutions to deal with this issue. A developer or

system administrator may want to keep a database on the same machine af-

ter the initial deployment. (This is why we provide the mapStatefulToPrevious

function in Section 5.5.) Another solution is to create a cluster of databases

that replicate the contents of the master database. In Chapter 9 we describe

a generic approach to deal with mutable state of services, which snapshots

mutable state and migrates it automatically as required.

Disnix currently only supports simple network topologies, in which every

target machine is directly reachable. There is also no method to model com-

plex network topologies yet. This prevents Disnix from using advanced de-

ployment planning algorithms, such as Avala [Mikic-Rakic et al., 2005], that

improve availability and take the network topology into account.

Furthermore, the contribution of our framework lies mainly in the mech-

anisms used for redeployment and the fact that component are treated as

“black boxes”. It would also be useful to integrate more sophisticated moni-

toring facilities and control loop implementations [Oreizy et al., 1998].

5.8 R E L AT E D W O R K

There is a substantial amount of work on self-adaptation in distributed en-

vironments. According to Cheng et al. [2008] the challenges of engineering

self-adaptive software systems can be divided into four categories: modelling

dimensions (goals, mechanisms and effects), requirements, engineering and

assurances. Our approach mainly belongs to the ﬁrst category, since our

framework’s main purpose is to provide facilities to perform redeployment

in case of an event.

Kramer and Magee [2007] present a three-layer reference model based on

existing research work in which to articulate some outstanding research chal-

lenges in self managed systems. The three layers are deﬁned as goal manage-

ment, change management and component control. Our framework mainly

contributes to the change management layer and a bit to the component con-

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 115

trol layer, although our monitoring and control-loop aspects are relatively

primitive.

Several comparable approaches to ours involve specialized middleware.

For instance, [Caseau, 2005] describes using self-adaptive middleware to pro-

pose routing strategies of messages between components to meet require-

ments in a service-level agreement. In [Guennoun et al., 2008] a framework

of models for QoS adaptation is described that operates at the middleware

and transport levels of an application. In [Wichadakul and Nahrstedt, 2002]

the authors propose a translation system to develop QoS-aware applications,

which for instance support CPU scheduling.

Another approach is to reconﬁgure previously deployed components, such

as by changing the behaviour of components (e.g. [Camara et al., 2009]) or

their connections (such as [Chan and Bishop, 2009; Baresi and Pasquale, 2010]

for adapting compositions of web services).

Some approaches involving reconﬁguration and redeployment are designed

for a speciﬁc component technology, such as [Oreizy et al., 1998] providing

an explicit architectural model and imperative language for modifying archi-

tectures consisting of C2 style components implemented in the Java program-

ming language. Moreover, the approach uses more primitive deployment fa-

cilities, e.g. it cannot automatically install components and their dependencies

on demand and without interference.

Compared to the previous approaches, Disnix is a more general approach

and does not require components of a distributed system to use a particular

component technology or middleware. Moreover, Disnix does not adapt com-

ponent behaviour or connections at runtime. Instead, it generates or builds

a new component and redeploys it. In some cases this takes more effort, but

the advantage is that Disnix tries to minimize this amount of effort and that

this approach can be used with various types of components regardless of

component technology. Moreover, if a crucial component breaks, the system

is redeployed to ﬁx the system, which reconﬁguration approaches cannot al-

ways do properly. In some cases runtime adaptation may still be required,

which should be managed by other means, such as by implementing such

properties in the services themselves.

Some aspects of the present work are suggested in [Deb et al., 2008], such

as an application model describing components and connections, a network

model, and utility functions to perform a mapping of components to ma-

chines. Although the authors assess the usefulness of some utility function

approaches, they do not perform actual deployment of a system, nor describe

a framework that reacts to events.

Avala [Mikic-Rakic et al., 2005] is an approximation algorithm to improve

availability through redeployment. The authors of the paper mainly assess the

performance of the algorithm, but not the deployment mechanisms. DICOM-

PLOY [Rigole and Berbers, 2006] is a decentralised deployment tool for com-

ponents using the DRACO component model and uses Collaborative Rein-

forcement Learning (CRL) in the design of the component deployment frame-

work. Disnix uses a centralised approach compared to DICOMPLOY. In some

116

cases DICOMPLOY may be more efﬁcient in redeploying a system, but Disnix

is more general in that it supports various types of components and various

strategies for mapping components to machines.

Garlan et al. [2004] describe an architecture-based self-adaptation approach

having the ability to handle a wide variety of systems. A drawback of their

approach is that its assumed that the system provides access hooks for moni-

toring and adapting and that probes are availble providing information about

system properties. Besides our deployment mechanisms, we do not assume

that we can “hook” into already deployed components to alter their behaviour

or conﬁguration. Instead, we redeploy components entirely (with a cache that

prevents us doing steps that have already been done before).

5.9 C O N C L U S I O N

We have shown a dynamic deployment extension built on top of Disnix, which

we have applied to several use cases. This deployment extension efﬁciently

and reliably redeploys a system in case of an event in the network (such as

a crashing machine) to keep the overall system working and to ensure that

desired non-functional requirements are met.

In the future we intend to improve this deployment system to properly

deal with mutable state of services, so that associated data with a service is

automatically migrated in case of a redeployment. Moreover, Disnix has to

be extended to support complex network topologies, so that more sophisti-

cated deployment algorithms can be supported and we can integrate more

sophisticated monitoring facilities and control loops.

Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 117

StaffTracker (PHP/MySQL)

State Description Discovery (s) Generation (s) Build (s) Distribution (s) Transition (s)

0 Initial deployment 1.0 27.9 31.9 1.2 1.6

1 Add MySQL server 2 1.0 8.3 4.7 1.7 1.7

2 Add MySQL server 3 1.0 4.4 4.5 1.2 1.3

3 Add Apache server 2 1.0 4.4 4.4 1.3 1.5

4 Remove MySQL server 2 1.0 8.2 4.6 1.5 1.8

5 Remove Apache server 1 1.0 8.8 4.4 0.9 1.2

StaffTracker (Web services)

0 Initial deployment 1.0 28.6 169.8 13.1 4.0

1 Add MySQL server $500 1.0 10.7 17.7 6.1 2.9

2 Add Tomcat server $500 1.0 8.4 10.2 13.7 5.7

3 Remove Tomcat server $500 1.0 4.2 0.5 0.6 2.0

Remove MySQL server $500 1.0 4.2 0.5 1.2 1.8

ViewVC

0 Initial deployment 1.0 28.5 555.9 9.5 17.5

Add MySQL server 1.0 8.2 0.6 0.6 1.1

2 Add Apache server 1.0 8.4 0.5 0.6 1.0

3 Add Subversion server + repo 1.0 8.4 6.3 3.7 8.2

Remove Apache server 1 1.0 8.4 4.6 7.5 1.9

5 Remove MySQL server 1 1.0 8.3 5.7 1.9 1.8

SDS2

0 Initial deployment 1.0 28.6 642.9 122.8 147.7

1 Add MySQL server 2 (hospital) 1.0 6.7 0.6 19.1 1.9

2 Add Tomcat server 2 (Philips) 1.0 8.4 0.6 18.5 1.8

3 Remove Tomcat server 1 (Philips) 1.0 8.3 360.2 26.9 6.3

4 Remove MySQL server 1 (hospital) 1.2 10.5 370.4 120.8 124.3

Table 5.1 Evaluation times of the example cases

118

Part III

Infrastructure Deployment

119

Distributed Infrastructure Deployment

A B S T R A C T

In addition to the deployment of service components, also underlying sys-

tem conﬁgurations of software systems must be deployed to either physical

machines or virtual machines hosted in an IaaS environment. As with service

deployment, developers and system administrators use automated tools. Con-

ventional infrastructure deployment tools often use convergent models. These

tools have various drawbacks, such as the fact that it is difﬁcult to guarantee

that a model yields the same conﬁguration on a different system, so that the

same conﬁguration can be reproduced, and imperative actions which cannot

be trivially undone. In this chapter, we describe DisnixOS a component pro-

viding complementary infrastructure deployment to Disnix. DisnixOS uses

charon – a network deployment tool built on top of NixOS.

6.1 I N T R O D U C T I O N

In the previous part of this thesis, we have covered service deployment as-

pects. In order to be able to properly use service components, we also require

the presence of the right infrastructure components. For example, machines

in the network must have the proper system resources and system services,

such as a database management system (DBMS) or an application server and

these must also be deployed by some means. Additionally, infrastructure

components must also satisfy various non-functional requirements, such as

performance and privacy concerns.

As with service deployment, infrastructure deployment is also a complex

and labourious process. For example, in order to conﬁgure a single machine

in a network, an operating system and the right packages must be installed, all

required system services must be enabled, user accounts must be created and

conﬁguration ﬁles must be created or adapted having all required settings.

This complexity is proportional to the number of machines in the network.

In addition to generic deployment steps, such as installing system ser-

vices, also machine-speciﬁc deployment steps have to be performed, such

as installing the right driver packages, conﬁguring network interfaces having

the right IP addresses, and hard drive partitions. Also certain infrastructure

components may have to be “tuned” to properly use the available system re-

sources to improve performance, such as the amount of RAM the DBMS may

utilise.

System conﬁgurations can also be deployed in various environments. For

example, next to physical machines, people nowadays also use virtual ma-

chines, which are typically hosted in a data-center of an IaaS provider, such

121

as Amazon EC2. Virtual machines in the cloud usually provide simpler hard-

ware management and lower costs. Apart from installing new system con-

ﬁgurations, also system conﬁgurations have to be upgraded, which is even

more difﬁcult and may break a conﬁguration. All these aspects introduce a

signiﬁcant amount of complexity to the deployment process of networks of

machines.

In addition to the fact that a deployment process is complicated, it is also

difﬁcult to reproduce a conﬁguration of a network of machines in a different

environment. For example, it is not trivial to move a networked system con-

ﬁguration from physical machines to an IaaS provider, or to move from one

IaaS provider to another and to be sure that the migrated conﬁguration has

exactly the same properties.

To manage the complexity of deployment processes, developers and system

administrators typically use various automated deployment solutions, such

as Cfengine [Burgess, 1995], Puppet [Puppet Labs, 2011] or Chef [Opscode,

2011]. These tools make it possible to specify how to conﬁgure a network

of machines in a declarative language, which can be used to perform the

required deployment steps automatically.

However, these conventional deployment tools use convergent models cap-

turing what modiﬁcations steps should be performed on machines in the net-

work according to a policy speciﬁcation. Convergent models have a number

of drawbacks. For example, it is harder to write models that always yield

the same conﬁguration regardless to what machine it is deployed. Further-

more, these tools also imperatively modify systems, which may not always

give the same result and these changes are not always trivial to undo. While

upgrading, there is also a time window in which the machine conﬁgurations

are inconsistent.

In this chapter, we describe DisnixOS, a component providing complemen-

tary infrastructure deployment to Disnix in addition to service deployment.

Instead of capturing how system conﬁgurations should be modiﬁed, it cap-

tures various aspects that constitutes a system (such as the system services)

and can separate logical properties of systems from physical characteristics.

From these speciﬁcations, a complete system conﬁguration for a speciﬁc tar-

get is produced, consisting of all packages, conﬁguration ﬁles and activation

scripts required to deploy the conﬁguration.

Apart from mutable state, the actual conﬁguration always matches the in-

tended conﬁguration deﬁned in the deployment speciﬁcation, which can be

reproduced anywhere without side effects. Moreover, due to the purely func-

tional nature of the underlying deployment tooling, we can also reliably up-

grade networks of machines. DisnixOS uses charon [Dolstra et al., 2013] – a

network deployment tool built on top of NixOS for infrastructure deployment

and provisioning of virtual machines in an IaaS environment, such as Amazon

EC2.

122

6.2 C O N V E R G E N T D E P L O Y M E N T: C F E N G I N E

The best known deployment tool for management of system conﬁgurations

is probably Cfengine [Burgess, 1995]. The purpose of Cfeninge is to provide

automated conﬁguration and maintenance of machines in a network from a

policy speciﬁcation. From this speciﬁcation, Cfengine performs a number of

imperative deployment actions, such as copying ﬁles, modifying ﬁle contents,

creating symlinks and invoking arbitrary tools in a network of machines (or a

subset belonging to a particular class).

Figure 6.1 shows a partial Cfengine model for conﬁguring an Apache web

server:

96 This is a template generating a common bundle deﬁning what bundles

must be processed in what order. In our example, we only process the

web_server bundle.

98 A bundle captures a speciﬁc system conﬁguration aspect. The web_-

ser ver bundle is a promise bundle (because the agent type is used) and

takes care of various aspects of the Apache web server, such as starting

and stopping.

99 Identiﬁers followed by a colon declare sections within a bundle. Here, a

vars section is deﬁned, assigning values to variable identiﬁers.

100 The processes section manages processes. Here, we determine the status

of the apache2 process. If the user has requested that the server must

be (re)started and if the Apache server is properly conﬁgured, Cfengine

executes the start_apache command (deﬁned later in this model). If the

user has requested that the server must be stopped, it invokes the stop

step of the Apache init.d script.

101 An identiﬁer followed by a double colon deﬁnes a class identiﬁer. Classes

are true or false design decisions, which can be either hard classes, us-

ing properties discovered at runtime, such as a time stamps, hostnames,

operating system names and process lists or soft classes, which are user

deﬁned combinations of other classes or results of promises.

102 The body of a section contains promises, which describe a state in which

we want to end up. Cfengine tries to converge to that given state. Promises

can capture various properties, such as the contents of a ﬁle of a result

from a tool invocation. Furthermore, state descriptions can also be pro-

duced by other bundles. This particular promise checks whether Apache

has to be (re)started and yields a true decision for the start_apache class,

if this is the case.

103 The promise in the commands section intends to start the Apache web

server. The class decision has been made earlier in

102

Chapter 6. Distributed Infrastructure Deployment 123

body common control { 96

bundlesequence => { "web_server" }; 97

}

bundle agent web_server(state) { 98

vars: 99

"document_root" string => "/var/www";

"site_http_conf" string => "/home/sander/httpd.conf";

"match_package" slist =>

{ "apache2", "apache2-mpm-prefork", "apache2-proxy-balancer" };

processes: 100

web_ok.on:: 101

"apache2" restart_class => "start_apache"; 102

off::

"apache2" process_stop => "/etc/init.d/apache2 stop";

commands:

103

start_apache:: "/etc/init.d/apache2 start";

packages: 104

"$(match_package)"

package_policy => "add",

package_method => rpm,

classes => if_ok("software_ok");

files: 105

software_ok::

"/etc/sysconfig/apache2"

edit_line => fixapache,

classes => if_ok("web_ok");

}

bundle edit_line fixapache { 106

vars:

"add_modules" slist => { "jk", "proxy_balancer" };

"del_modules" slist => { "php4", "php5" };

insert_lines:

"APACHE_CONF_INCLUDE_FILES="$(web_server.site_http_conf)";

field_edits:

"APACHE_MODULES=.

edit_field => quotedvar("$(add_modules)","append");

"APACHE_MODULES=.

edit_field => quotedvar("$(del_modules)","delete");

}

Figure 6.1 A Cfengine code fragment to conﬁgure an Apache web server

104 The promise in the packages section checks whether several required

RPM packages are present, such as the Apache web server and several

Apache modules. If these are not present, it invocates the RPM package

manager [Foster-Johnson, 2003] on the target machine, to install them.

If the installation succeeds and all packages are present, Cfengine yields

a true decision for the software_ok class.

105 The promise in the ﬁles section checks whether the Apache system con-

ﬁguration ﬁle contains a number of required settings and updates them

124

if they are not there. This promise is only executed if all the required

Apache software packages are present, which should be achieved by the

previous promise 104 . If this process has achieved the intended result,

it yields a true decision for the web_ok class. The actual modiﬁcation of

the conﬁguration ﬁle is deﬁned in the ﬁxapache bundle, shown later in

this model.

106 This bundle contains promise body parts, which are used to modify the

Apache system conﬁguration ﬁle. In our example, this bundle adds a

reference in the system conﬁguration ﬁle to the web server conﬁguration

ﬁle, removes a number of Apache modules from the Apache conﬁgura-

tion: (php4 and php5) and it adds several Apache modules: mod_jk and

mod_proxy_balancer.

6.2.1 Similarities

Cfengine models are declarative models in the sense that a developer or system

administrator write promises, which describe the desired state that must be

achieved, instead of the steps that have to be performed. From these collec-

tions of promises, Cfengine tries to converge [Traugott and Brown, 2002] to the

described state. First, Cfengine determines whether it has already achieved

the intended state and – if this is not the case – it performs all the required

steps to change the state to come to the new desired state as closely as possi-

ble.

If we compare the Cfengine concepts to the goals of our reference archi-

tecture for distributed software deployment, it achieves a number of similar

goals:

• They are designed to fully automate deployment tasks, reducing de-

ployment times and chances on human errors.

• Both approaches intend to support integration of custom tools and to

work on various types of components and systems.

• Making deployment processes efﬁcient is also an important ingredient;

Cfengine always checks whether the intended state has been reached,

and only performs the steps that are necessary to reach it.

6.2.2 Differences

The convergent deployment approach also has a number of drawbacks com-

pared to our deployment vision:

• Cfengine has several reliability weaknesses. One of the issues is that it

is not trivial to ensure dependency completeness. For example, it may

be possible to successfully deploy an Apache web server conﬁguration,

while not specifying all the package dependencies on the Apache mod-

ules that the conﬁguration requires. For example, in Figure 6.1, we have

Chapter 6. Distributed Infrastructure Deployment 125

deﬁned that we want to use the mod_jk and mod_proxy_balancer Apache

modules, but we did not specify that we want to install the RPM pack-

ages providing these modules.

Deploying such a conﬁguration may yield a working system on a sys-

tem which accidentally has the required installed packages installed and

yield an inconsistent conﬁguration on a system which do not have these

packages installed. Developers and system administrators have to take

care that these dependencies are present, and there is no means to stati-

cally verify them.

• Deployment steps are also performed imperatively, as ﬁles are overwrit-

ten or removed. Therefore, it is hard to ensure purity and to undo these

actions

. While these imperative steps are performed, a system is also

temporarily inconsistent, as packages and conﬁguration ﬁles may be-

long to both an old conﬁguration and a new conﬁguration. During this

phase also system s ervices can be stopped and started, rendering parts

of a system unavailable.

• It is also difﬁcult to ensure reproducability. Because modiﬁcations depend

on the current state of the system, actions that result from a Cfengine

model may produce a different result on a different system, e.g. because

the initial contents of system conﬁguration ﬁles are different. Also im-

plicit dependencies that are required, but not declared in the Cfengine

model, may make it difﬁcult to reproduce a conﬁguration. Further-

more, certain conﬁguration properties are platform speciﬁc. For ex-

ample, RPM packages with the same name may not provide the same

functionality or compatibility on a different operating system or a newer

version of a Linux distribution.

Typically, because of these drawbacks, Cfengine users have to change a

Cfengine model and reconﬁgure networks of systems several times in (es-

pecially in heterogeneous networks), to see whether the desired results are

achieved and they must take extra care that all dependencies are present.

6.3 O T H E R D E P L O Y M E N T A P P R O A C H E S

Many deployment tools have been inspired by Cfengine providing extra and

improved features, such as Puppet [Puppet Labs, 2011] and Chef [Opscode,

2011]. An additional feature of Puppet is “transactional” upgrades, by trying

to simulate the deployment actions ﬁrst (e.g. by checking whether a ﬁle can be

actually copied or adapted) before performing the actual deployment process.

This technique is similar to [Cicchetti et al., 2009] used for making maintainer

scripts of RPM and Debian packages transactional. Although simulation re-

duces the risks on errors, interruption of the actual deployment process may

still render a system conﬁguration inconsistent.

Of course, Cfengine can also be instructed to make a backup, but these practices are uncom-

mon and require extra speciﬁcation that make these models bigger and more complicated.

126

Apart from Cfengine-based tools, there are also other tools used for dis-

tributed deployment. MLN [Begnum, 2006] is a tool for managing large net-

works of virtual machines and has a declarative language to specify arbitrary

topologies. It does not manage the contents of VMs beyond a templating

mechanism.

ConfSolve [Hewson et al., 2012] is a tool that can be used to model sys-

tem conﬁguration aspects in a class-based Object-Oriented domain-speciﬁc

language. A model can be translated into a Constraint Satisfaction Problem

(CSP) to generate valid system conﬁgurations. ConfSolve’s approach is to

compose system conﬁgurations based on constraints, but not focused on the

mechanics needed to reliably deploy such conﬁgurations.

There are also many distributed application component deployment tools

available, such as the S oftware Dock [Hall et al., 1999] and Disnix [van der

Burg and Dolstra, 2010a]. These tools manage the deployment of applica-

tion components (e.g. web services) but not their underlying infrastructure

components and their settings. These tools are actually related to aspects

described in the previous part of this thesis, covering service deployment as-

pects.

6.4 C O N G R U E N T D E P L O Y M E N T

In the previous section, we have explained how conventional tools with con-

vergent models work and what their drawbacks are in reaching our deploy-

ment vision supporting reliable, reproducible and efﬁcient deployment.

In this section, we explain how we have implemented an approach using

congruent models. We have achieved this goal by expanding upon earlier

work: NixOS, described in Section 2.3 of this thesis.

As we have seen, NixOS manages complete system conﬁgurations, includ-

ing all packages, boot scripts and conﬁguration ﬁles, from a single declarative

speciﬁcation. Furthermore, because NixOS uses Nix, it has unique features

such as the ability to store static components safely in isolation from each

other, so that upgrades are reliable.

6.4.1 Specifying a network conﬁguration

In order to achieve congruent deployment, we have to perform our deploy-

ment process in an unconventional manner and we have to specify our in-

tended conﬁgurations differently.

First, we have to express what the properties of the systems in the network

are, instead of what changes we want to carry out. The models used by Dis-

nixOS and charon are declarative about the composition of system components

whereas the models used by Cfengine (and derivatives) are declarative about

the intended result of imperative actions. From our speciﬁcation, which are

declarative about the system properties, DisnixOS decides what components

must be deployed and which can be remain untouched.

Chapter 6. Distributed Infrastructure Deployment 127

{ 107

storage = 108

{pkgs, config, nodes, ...}:

{

services.portmap.enable = true;

services.nfs.server.enable = true;

services.nfs.server.exports = ’’

/repos

(rw,no_root_squash)

’’;

};

postgresql = 109

{pkgs, config, nodes, ...}:

{

services.postgresql.enable = true;

services.postgresql.enableTCPIP = true;

};

webserver =

110

{pkgs, config, nodes, ...}:

{

fileSystems = pkgs.lib.mkOverride 50

[ { mountPoint = "/repos";

device = "

${nodes.storage.networking.hostName}:/repos";

111

fsType = "nfs";

options = "bootwait"; }

];

services.portmap.enable = true;

services.nfs.client.enable = true;

services.httpd.enable = true;

services.httpd.adminAddr = "root@localhost";

services.httpd.extraSubservices = [

{ serviceType = "trac"; }

];

environment.systemPackages = [ pkgs.pythonPackages.trac pkgs.subversion ];

};

}

Figure 6.2 network-logical.nix: A partial logical conﬁguration of a Trac environment

Second, we have to separate logical and physical characteristics of the ma-

chines in the network to improve retargetability. Typically, when conﬁguring

machines many aspects of the conﬁguration are the same, such as installing

end-user packages and installing system services, regardless of the type of

machine we want to deploy. Some other aspects are speciﬁc to the system,

such as driver packages, network settings, hard drive partitions and machine

speciﬁc performance tunings.

Logical speciﬁcation

Figure 6.2 shows a partial example of a congruent model deﬁned in the

Nix expression language describing a logical conﬁguration of a Trac environ-

ment

, a web application system for managing software development projects:

http://trac.edgewall.org

128

107 The model in Figure 6.2 represents an attribute set in the form: { name

= value

; name

= value

; ...; name

= value

; }, in which each attribute name

represents a machine in the network and each attribute value captures a

logical conﬁguration aspect of the machine.

108 Every attribute value refers to a NixOS module, capturing the logical

conﬁguration aspects of a system. As we have seen earlier in Sec-

tion 2.3.3, NixOS modules are functions taking several arguments. The

pkgs argument refers to Nixpkgs, a collection of over 2500 packages that

can be automatically deployed by Nix and the conﬁg argument refers to

an attribute set capturing all conﬁguration properties of the system.

In a network speciﬁcation, such as Figure 6.2, NixOS modules may also

take the nodes parameter which can be used to retrieve properties from

the conﬁguration of any machine in the network. In the body of each

NixOS module, we deﬁne an attribute set capturing various conﬁgura-

tion aspects.

For the storage machine, we have speciﬁed that we want to enable a

NFS server and the RPC portmapper (which is required by NFS) system

services. Furthermore, we deﬁne the /repos directory as a NFS mount,

which purpose is to store Subversion repositories which must be re-

motely accessible.

109 For the postgresql machine we have deﬁned that we want to enable the

PostgreSQL database server, which should also accept TCP/IP connec-

tions next to Unix domain socket connections.

110 The webserver machine is conﬁgured to have an Apache HTTP server

instance hosting the Trac web application. Furthermore, it also uses

a NFS client to access to Subversion repositories stored on the storage

machine.

111 This line speciﬁes the NFS device containing the hostname of the storage

system and the repos directory containing Subversion repositories that

we want to access. The nodes parameter is used to retrieve the hostName

property from the storage system.

Physical speciﬁcation

The speciﬁcation in Figure 6.2 captures the logical conﬁguration of a network

machines. All the properties in this model are usually the same for any type

of machine in a network to which we want to deploy the conﬁguration.

In order to perform the complete deployment process of a network of ma-

chines, the speciﬁcation shown in the previous section is not sufﬁcient. Next

to logical properties, we also have to provide machine speciﬁc physical prop-

erties, such as network settings, hard drive partitions and boot loader settings,

which are typically different across different various types of machines.

Figure 6.3 shows an example of a physical network conﬁguration. The

structure of this model is the same as the logical speciﬁcation shown in Fig-

ure 6.2, except that it captures machine speciﬁc settings. For example, we

Chapter 6. Distributed Infrastructure Deployment 129

{

database =

{ pkgs, config, nodes, ...}:

{

boot.loader.grub = {

version = 2;

device = "/dev/sda";

};

boot.initrd.kernelModules = [ "ata_piix" "megaraid_sas" ];

fileSystems = [

{ mountPoint = "/"; device = "/dev/sda1"; }

];

nixpkgs.system = "i686-linux";

networking.hostName = "database.example.org";

services.disnix.enable = true;

...

};

storage = ...

webserver = ...

}

Figure 6.3 network-physical.nix: A partial physical network conﬁguration for a Trac

environment

have deﬁned that the MBR of the GRUB bootloader should be stored on the

ﬁrst hard drive, that we need several kernel modules supporting a particular

RAID chipset, that the ﬁrst partition of the ﬁrst hard drive is the root parti-

tion, that the system is a 32-bit Intel Linux machine (i686-linux) and that the

host name of the machine is: database.example.org. As we have seen earlier in

Section 2.3, these attributes are all NixOS conﬁguration properties.

Furthermore, in order to be able to use Disnix and to be able to active a

service component, we must enable the Disnix service. The corresponding

NixOS module ensures that all supported activation types are conﬁgured.

For example, if the Apache HTTP service is installed, then the Disnix NixOS

module triggers the installation of the apache-webapplication activation script.

Besides physical characteristics of machines, we can also annotate physical

models with IaaS provider properties, such as Amazon EC2 speciﬁc settings,

so that virtual machines can be automatically instantiated with the appropri-

ate system resources, such as the amount of CPU cores, persistence storage

space etc. The RELENG paper about Charon has more details on the instanti-

ation and provisioning of virtual machines [Dolstra et al., 2013].

6.5 I M P L E M E N TAT I O N O F T H E I N F R A S T R U C T U R E D E P L O Y-

M E N T P R O C E S S

In the previous section, we have described how to deﬁne various concerns of

networks of machines in separate models. DisnixOS uses charon to perform

the deployment of the system conﬁgurations. Apart from the fact that NixOS

conﬁgurations are merged using the NixOS module system and charon is ca-

130

Figure 6.4 Architecture of disnixos-env

pable of instantiating VMs in a cloud provider environment (and uploading

bootstrap NixOS disk images), the steps performed by charon are roughly the

same as Disnix (described earlier in Chapter 4), e.g. it also builds components

of entire system conﬁgurations ﬁrst, then it transfers the closures and ﬁnally

it activates the system conﬁgurations. If any of the activation steps fail, a

roll back is perfor med. As entire system conﬁgurations (similar to services in

the previous chapter) are also Nix components, we can safely store them in

isolation from each other.

6.6 U S A G E

The usage of DisnixOS is similar to the basic Disnix toolset. The major differ-

ence is that instead of an infrastructure model, DisnixOS takes a collection of

network speciﬁcations capturing several machine conﬁguration aspects of all

machines in the network. These are exactly the same speciﬁcations used by

charon. By running the following command:

$ disnixos-env -s services.nix -n network-logical.nix \

-n network-physical.nix -d distribution.nix

The complete deployment process is performed in which both the services of

which the system is composed and the system conﬁgurations of network of

machines are built from source code. Then the NixOS conﬁgurations in the

network are upgraded, and ﬁnally the service-oriented system is deployed the

same way the regular disnix-env command does.

6.7 A R C H I T E C T U R E

Figure 6.4 shows the architecture of the disnixos-env tool. First disnixos-env in-

vokes charon to deploy the network of machines deﬁned in the network speci-

ﬁcations, each capturing a speciﬁc concern.

Chapter 6. Distributed Infrastructure Deployment 131

{

test1 =

{pkgs, config, nodes, ...}:

{

services.tomcat.enable = true;

nixpkgs.system = "x86_64-linux";

disnix.infrastructure.zone = "philips";

};

test2 =

{pkgs, config, nodes, ...}:

{

services.tomcat.enable = true;

services.mysql.enable = true;

services.mysql.rootPassword = ./mysqlpw;

services.mysql.initialScript = ./mysqlscript;

nixpkgs.system = "i686-linux";

disnix.infrastructure.zone = "hospital";

};

}

Figure 6.5 A NixOS network conﬁguration

{

test1 = {

tomcatPort = 8080;

supportedTypes = [ "process" "wrapper" "tomcat-webapplication" ];

system = "x86_64-linux";

zone = "philips";

};

test2 = {

tomcatPort = 8080;

mysqlPort = 3306;

mysqlUsername = "root";

mysqlPassword = builtins.readFile ./mysqlpw;

supportedTypes = [ "process" "wrapper" "tomcat-webapplication" "mysql-database" ];

system = "i686-linux";

zone = "hospital";

};

}

Figure 6.6 A generated infrastructure expression

After charon has successfully deployed the system conﬁgurations, the disnixos-

geninfra tool is invoked, which purpose is to convert the given network speci-

ﬁcations into a Disnix infrastructure model.

The difference between the network speciﬁcations that charon uses and the

infrastructure model that Disnix uses, is that the charon models are explicit

models, which can be used to derive complete conﬁgurations, whereas the

Disnix infrastructure model is implicit; it reﬂects the infrastructure properties

as they should be, but the developer or system administrators have to make

sure that these properties are correct. Furthermore, the infrastructure model

only has to contain properties required for service deployment (not all system

properties) and can also be used to represent target machines deployed by

other means (whereas network models can only be used by NixOS).

132

To determine what infrastructure properties must be provided, various

NixOS conﬁguration settings are examined. For example, if the

ser vices.tomcat.enable property is enabled (which enables Apache Tomcat), the

resulting infrastructure model contains the Apache Tomcat port number (tom-

catPort), as well as the tomcat-webapplication type in the supportedTypes list. Fur-

thermore, in order to provide non-functional properties to the generated in-

frastructure model, end-users can deﬁne custom attributes in the disnix.infrastructure

NixOS conﬁguration attribute set. Figure 6.5 shows an example network con-

ﬁguration and Figure 6.6 shows the resulting infrastructure model.

After the infrastructure model is generated, the regular disnix-env tool is

invoked, which takes care of deploying all services. Disnix takes the user

provided serv ices model and distribution models as well as the generated

infrastructure model as input parameters.

If any of these previous steps fail, a rollback is performed which restores

both the previous service deployment state as well as the previous infrastruc-

ture conﬁguration.

6.8 E X P E R I E N C E

We have used DisnixOS for the deployment of a number of test subjects,

which we have captured in logical network models. In addition to the Dis-

nix examples described in Chapter 4, we have also deployed a collection of

WebDSL applications running on Java Servlet containers, such as Yellow-

grass (http://yellowgrass.org), Researchr (http://researchr.org)

and webdsl.org (http://webdsl.org), a Mediawiki environment (http:

//mediawiki.org) (in which we have deﬁned a web ser ver hosting the Wiki

application and a server running a PostgreSQL server to store the Wiki con-

tents), a Trac environment (http://trac.edgewall.org), and a Hydra

build environment (http://hydra.nixos.org), a continuous build and

integration cluster consisting of 11 machines.

We were able to deploy the logical network conﬁgurations to various net-

works with different characteristics, such as a network consisting of physical

machines and QEMU and VirtualBox VMs as well as various IaaS provider

environments, such as Amazon EC2 and Linode. Furthermore, in all deploy-

ment scenarios, we never ran into an inconsistent state, due to missing de-

pendencies or a situation in which a conﬁguration contains components from

an older system conﬁguration and a newer system conﬁguration at the same

time.

6.9 D I S C U S S I O N

In this chapter, we have explained how we achieve congruent deployment

with DisnixOS for static parts of a system. There are some desired properties

that we did not achieve.

As with Disnix for service deployment, DisnixOS and charon are able to

manage static parts of a system in isolation in the Nix store. However, the

Chapter 6. Distributed Infrastructure Deployment 133

underlying deployment model is not capable of dealing with mutable state,

such as databases. Currently, the only way we deal with state is by generating

self-initialising activation scripts, which (for example) import a schema on ini-

tial startup. If we migrate a deployed web application from one environment

to another, the data is not automatically migrated.

In order to deploy systems with DisnixOS or charon, all components of a

system must be captured in Nix expressions so that they can be built entirely

by Nix and isolated in the Nix store. They are not designed for adapting

conﬁgurations not managed by Nix, because we cannot safely manage com-

ponents deployed by other means in a purely functional manner.

6.10 L I M I TAT I O N S

DisnixOS is built on top of NixOS, which is a Linux based system. For many

operating systems (especially proprietary ones), system services are so tightly

integrated with the operating system that they cannot be managed through

Nix (upon which Disnix is based). The DisnixOS extension is not suitable

for deployment of networks using other types of Linux distributions or other

operating systems, such as FreeBSD

or Windows, because they have system

conﬁgurations, which cannot be completely managed by Nix.

On proprietary infrastructure, such as Windows, the basic Disnix toolset

can still be used for deploying service components, but infrastructure compo-

nents have to be deployed by other means, e.g. manually or by using other

deployment tools.

6.11 C O N C L U S I O N

In this chapter, we have explained how distributed deployment tools with

convergent models work and we have explained DisnixOS providing comple-

mentary infrastructure deployment to Disnix, which uses congruent models

that always match the intended conﬁguration for static parts of a system. We

have used this tool to deploy various system conﬁgurations in various envi-

ronments, such as the Amazon EC2 cloud.

As FreeBSD is free and open source, it may also be possible to implement a FreeBSD-based

NixOS, however nobody has done this so far.

134

Automating System Integration Tests for

Distributed Systems

A B S T R A C T

Automated regression test suites are an essential software engineering prac-

tice: they provide developers with rapid feedback on the impact of changes to

a system’s source code. The inclusion of a test case in an automated test suite

requires that the system’s build process can automatically provide all the en-

vironmental dependencies of the test. These are external elements necessary for

a test to succeed, such as shared libraries, running programs, and so on. For

some tests (e.g., a compiler’s), these requirements are simple to meet.

However, many kinds of tests, especially at the integration or system level,

have complex dependencies that are hard to provide automatically, such as

running database servers, administrative privileges, services on external ma-

chines or speciﬁc network topologies. As such dependencies make tests difﬁ-

cult to script, they are often only performed manually, if at all. This particu-

larly affects testing of distributed systems and system-level software.

In this chapter, we show how we can automatically instantiate the complex

environments necessary for tests by creating (networks of) virtual machines

on the ﬂy from declarative speciﬁcations. Building on NixOS, a Linux distri-

bution with a declarative conﬁguration model, these speciﬁcations concisely

model the required environmental dependencies. We also describe techniques

that allow efﬁcient instantiation of VMs. As a result, complex system tests be-

come as easy to specify and execute as unit tests. We evaluate our approach

using a number of representative problems, including automated regression

testing of a Linux distribution.

Furthermore, we also show how to integrate these techniques with Disnix,

so that apart from deploying service-oriented systems in real networks, they

can also be continuously tested in a virtual environment.

7.1 I N T R O D U C T I O N

Automated regression test suites are an essential software engineering prac-

tice, as they provide developers with rapid feedback on the impact of changes

to a system’s source code. By integrating such tests in the build process of a

software project, developers can quickly determine whether a change breaks

some functionality. These tests are easy to realise for certain kinds of software:

for instance, a regression test for a compiler simply compiles a test case, runs

it, and veriﬁes its output; similarly, with some scaffolding, a unit test typically

just calls some piece of code and checks its result.

135

However, other regression tests, particularly at the integration or system

level, are signiﬁcantly harder to automate because they have complex require-

ments on the environment in which the tests execute. For instance, they might

require running database servers, administrative privileges, services on exter-

nal machines or speciﬁc network topologies. This is especially a concern for

distributed systems and system-level software. Consider for instance the fol-

lowing motivating examples used in this chapter:

• OpenSSH is an implementation of the Secure Shell protocol that allows

users to securely log in on remote systems. One component is the pro-

gram sshd, the secure shell server daemon, which accepts connections

from the SSH client program ssh, handles authentication, starts a shell

under the requested user account, and so on. A test of the daemon must

run with super-user privileges on Unix (i.e. as root) because the daemon

must be able to change its user identity to that of the user logging in. It

also requires the existence of several user accounts.

• Quake 3 Arena is a multiplayer ﬁrst-person shooter video game. An auto-

mated regression test of the multiplayer functionality must start a server

and a client and verify that the client can connect to the server success-

fully. Thus, such a test requires multiple machines. In addition, the

clients (when running on Unix) require a running X11 server for their

graphical user interfaces.

• Transmission is a Bittorrent client, managing downloads and uploads of

ﬁles using the peer-to-peer Bittorrent protocol. Peer-to-peer operation

requires that a running Bittorrent client is reachable by remote Bittor-

rent clients. This is not directly possible if a client resides behind a

router that performs Network Address Translation (NAT) to map inter-

nal IP addresses to a single external IP address. Transmission clients

automatically attempt to enable port forwarding in routers using the

UPnP-IGD (Universal Plug and Play Internet Gateway Device) proto-

col. A regression test for this feature thus requires a network topology

consisting of an IGD-enabled router, a client “behind” the router, and a

client on the outside. The test succeeds if the second client can connect

to the ﬁrst through the router.

Thus, each of these tests requires special privileges, system services, exter-

nal machines, or network topologies. This makes them difﬁcult to include in

an automated regression test suite: as these are typically started from (say) a

Makeﬁle on a developer’s machine prior to checking in changes, or on a con-

tinuous build system, it is important that they are self-contained. That is, the

test suite should set up the complete environment that it requires. Without

this property, test environments must be set up manually. For instance, we

could manually set up a set of (virtual) machines to run the Transmission test

suite, but this is labourious and inﬂexible. As a result, such tests tend to be

done on an ad hoc, semi-manual basis. For example, many major system Unix

136

packages (e.g. OpenSSH, the Linux kernel or the Apache web server) do not

have automated test suites.

In this chapter, we describe an approach to make such tests as easy to

write and execute as conventional tests that do not have complex environ-

mental dependencies. This opens a whole class of automated regression tests

to developers. In this approach, a test declaratively speciﬁes the environment

necessary for the test, such as machines and network topologies, along with

an imperative test script. From the speciﬁcation of the environment, we can

then automatically build and instantiate virtual machines (VMs) that implement

the speciﬁcation, and execute the test script in the VMs.

We achieve this goal by expanding on our previous work on NixOS. As we

have seen earlier in Section 2.3, in NixOS, the entire operating system – sys-

tem packages such as the kernel, server packages such as Apache, end-user

packages such as Firefox, conﬁguration ﬁles in /etc and boot scripts, and so

on – is built from source from a speciﬁcation in what is in essence a purely

functional “Makeﬁle”. The fact that Nix builds from a purely functional spec-

iﬁcation means that conﬁgurations can easily be reproduced.

The latter aspect forms the basis for this chapter. In a normal NixOS sys-

tem, the conﬁguration speciﬁcation is used to build and activate a conﬁgura-

tion on the local machine. In Section 7.2, we show how these speciﬁcations can

be used to produce the environment necessar y for running a single-machine

test. We also describe techniques that allow space and time-efﬁcient instanti-

ation of VMs. In Section 7.3, we extend this approach to networks of VMs. In

Section 7.4, we show how we can integrate the distributed testing techniques

with Disnix, to automatically test ser vice-oriented systems deployed by Dis-

nix. We discuss various aspects and applications of our work in Section 7.5,

including distributed code coverage analysis and continuous build systems.

We have applied our approach to several real-world scenarios; we quantify

this experience in Section 7.6.

7.2 S I N G L E - M A C H I N E T E S T S

NixOS system conﬁgurations allow developers to concisely specify the en-

vironment necessary for a integration or system test and instantiate virtual

machines from these, even if such tests require special privileges or running

system services. After all, inside a virtual machine, we can do any actions

that would be dangerous or not permitted on the host machine. We ﬁrst ad-

dress single-machine tests; in the next section, we extend this to networks of

machines.

7.2.1 Specifying and running tests

Figure 7.1 shows an implementation of the OpenSSH regression test described

in Section 7.1:

113 The speciﬁcation consists of two parts. The machine attribute refers to

a NixOS conﬁguration, as we have seen in Section 2.3; a declarative

Chapter 7. Automating System Integration Tests for Distributed Systems 137

let openssh = stdenv.mkDerivation { ... };

makeTest { 112

machine =

{ config, pkgs, ... }: 113

{

users.extraUsers =

[

{ name = "sshd"; home = "/var/empty"; }

{ name = "bob"; home = "/home/bob"; }

];

};

testScript = ’’ 114

$machine→succeed( 115

"${openssh}/bin/ssh-keygen -f /etc/ssh/ssh_host_dsa_key",

"${openssh}/sbin/sshd -f /dev/null",

"mkdir -m 700 /root/.ssh /home/bob/.ssh",

"${openssh}/bin/ssh-keygen -f /root/.ssh/id_dsa",

"cp /root/.ssh/id_dsa.pub "/home/bob/.ssh/authorized_keys");

$machine→waitForOpenPort(22);

116

$machine→succeed("${openssh}/bin/ssh bob\@localhost ’echo \$USER’") eq "bob\n" or die;

’’;

}

Figure 7.1 openssh.nix: Speciﬁcation of an OpenSSH regression test

speciﬁcation of the machine in which the test is to be performed. The

machine speciﬁcation is very simple: all that we need for the test beyond

a basic NixOS machine is the existence of two user accounts (sshd for the

SSH daemon’s privilege separation feature, and bob as a test account for

logging in).

114 The testScript attribute is an imperative test script implemented in Perl

performing operations in the virtual machine (the guest) using a number

of primitives. The OpenSSH test script creates an SSH host key (required

by the daemon to allow clients to verify that they are connecting to the

right machine), starts the daemon, creates a public/private key pair, add

the public key to Bob’s list of allowed keys, waits until the daemon is

ready to accept connections, and logs in as Bob using the private key.

Finally, it veriﬁes that the SSH session did indeed log in as Bob and

signals failure otherwise.

115 The succeed primitive executes shell commands in the guest and aborts

the test if they fail.

116 waitForOpenPort waits until the guest is listening on the speciﬁed TCP

port.

112 The function makeTest applied to machine and testScript evaluates to two

attributes: vm, a derivation that builds a script to start a NixOS VM

matching the speciﬁcation in machine; and test, a derivation that depends

on vm, runs its script to start the VM, and then executes testScript.

138

The following command performs the OpenSSH test:

$ nix-build openssh.nix -A test

That is, it builds the OpenSSH package as well as a complete NixOS instance

with its hundreds of dependencies. For interactive testing, a developer can

also do:

$ nix-build openssh.nix -A vm

$ ./result/bin/run-vm

(The call to nix-build leaves a symbolic link result to the output of the vm deriva-

tion in the Nix store.) This starts the virtual machine on the user’s desktop,

booting a NixOS instance with the speciﬁed functionality.

7.2.2 Implementation

Two important practical advantages of our approach are that the implemen-

tation of the VM requires no root privileges and is self-contained (openssh.nix

and the expressions that build NixOS completely describe how to build a VM

automatically). These properties allow such tests to be included in an auto-

mated regression test suite.

Virtual machines are built by the NixOS module qemu-vm.nix that deﬁnes

a conﬁguration value system.build.vm, which is a derivation to build a shell

script that starts the NixOS system built by system.build.toplevel in a virtual ma-

chine. We use QEMU/KVM (http://www.linux-kvm.org/), a modiﬁed

version of the open source QEMU processor emulator that uses the hardware

virtualisation features of modern CPUs to run VMs at near-native speed. An

important feature of QEMU over most other VM implementations is that it

allows VM instances to be easily started and controlled from the command

line. This includes the fully automated starting, running and stopping of a

VM in a derivation. Furthermore, QEMU provides special support for boot-

ing Linux-based operating systems: it can directly boot from a kernel and

initial ramdisk image on the host ﬁlesystem, rather than requiring a full hard

disk image with that kernel installed. (The initial ramdisk in Linux is a small

ﬁlesystem image responsible for mounting the real root ﬁlesystem.) For in-

stance, the system.build.vm derivation generates essentially this script:

${pkgs.qemu_kvm}/bin/qemu-system-x86_64 -smb /

-kernel ${config.boot.kernelPackages.kernel}

-initrd ${config.system.build.initialRamdisk}

-append "init=${config.system.build.bootStage2}

systemConfig=${config.system.build.toplevel}"

The system.build.vm derivation does not build a virtual hard disk image for

the VM, as is usual. Rather, the initial ramdisk of the VM mounts the Nix

store of the host through the network ﬁlesystem CIFS (the -smb / option above);

QEMU automatically starts a CIFS server on the host to service requests from

the guest. This is a crucial feature: the set of dependencies of a system is

Chapter 7. Automating System Integration Tests for Distributed Systems 139

hundreds of megabytes in size at the least, so to build such an image ev-

ery time we reconﬁgure the VMs would be very wasteful in time and space.

Thus, rebuilding a VM after a conﬁguration change usually takes only a few

seconds.

The VM start script does create an empty ext3 root ﬁlesystem for the guest

at startup, to hold mutable state such as the contents of /var or the system

account ﬁle /etc/passwd. Thanks to sparse allocation of blocks in the virtual disk

image, image creation takes a fraction of a second. NixOS’s boot process is

self-initialising: it initialises all state needed to run the system. For interactive

use, the ﬁlesystem is preserved across restarts of the VM, saved in the image

ﬁle ./hostname.qcow2.

QEMU provides virtualised network connectivity to VMs. The VM has

a network interface, eth0, that allows it to talk to the host. The guest has

IP address 10.0.2.15, QEMU’s virtual gateway to the host is 10.0.2.2, and the

CIFS server is 10.0.2.4. This feature is implemented entirely in user space: it

requires no root privileges.

The test script executes commands on the VM using a root shell running

on the VM that receives commands from the host through a TCP connection

between the VM and the host. (The remotely accessible root shell is provided

by a NixOS module added to the machine conﬁguration by makeTest and does

not exist in normal use.) QEMU allows TCP ports in the guest network to be

forwarded to Unix domain sockets [Stevens and Rago, 2005] on the host. This

is important for security: we do not want anybody other than the test script

connecting to the port. Also, in continuous build environments any number

of builds may execute concurrently; the use of ﬁxed TCP ports on the host

would preclude this.

7.3 D I S T R I B U T E D T E S T S

Many typical system tests are distributed: they require multiple machines

to execute. Therefore a natural extension of the declarative machine spec-

iﬁcations in the previous section is to specify entire networks of machines,

including their topologies.

7.3.1 Specifying networks

Figure 7.2 shows a small automated test for Quake 3 Arena. As described in

Section 7.1, it veriﬁes that clients can successfully join a game running on a

non-graphical server:

117 Here, makeTest is called with a nodes argument instead of a single ma-

chine. This is a set specifying all the machines in the network; each

attribute is a NixOS conﬁguration module. This model has the same

deﬁnition as the logical network speciﬁcations described in Chapter 6.

118 In our example, we specify a network of two machines. The server ma-

chine automatically starts a Quake server daemon.

140

makeTest {

nodes = 117

{ server = 118

{ config, pkgs, ... }:

{

jobs.quake3Server =

{ startOn = "startup";

exec =

"${pkgs.quake3demo}/bin/quake3"

+ " +set dedicated 1"

+ " +set g_gametype 0"

+ " +map q3dm7 +addbot grunt"

+ " 2> /tmp/log";

};

client = 119

{ config, pkgs, ... }:

{

services.xserver.enable = true;

environment.systemPackages = [ pkgs.quake3demo ];

};

testScript = ’’ 120

startAll;

$server→waitForJob("quake3-server");

$client→waitForX;

$client→succeed("quake3 +set name Foo +connect server &");

sleep 40;

$server→succeed("grep ’Foo.

entered the game’ /tmp/log");

$client→screenshot("screen.png");

’’;

}

Figure 7.2 Speciﬁcation of a Quake client/server regression test

119 The client machine runs an X11 graphical user interface and has the

Quake client installed, but otherwise does nothing.

120 The test script in the example ﬁrst starts all machines and waits until

they are ready. This speeds up the test as it boots the machines in paral-

lel; otherwise they are only booted on demand, i.e., when the test script

performs an action on the machine. It then executes a command on the

client to start a graphical Quake client and connect to the server. Af-

ter a while, we verify on the server that the client did indeed connect.

The derivation will fail to build if this is not the case. Finally, we make

a screenshot of the client to allow visual inspection of the end state, if

desired.

The attribute test returned by makeTest evaluates to a derivation that exe-

cutes the VMs in a virtual network and runs the given test suite. The test

script uses additional test primitives, such as waitForJob (which waits until the

given system service has started successfully) and screenshot (which takes a

screenshot of the virtual display of the VM).

GUI testing is a notoriously difﬁcult subject [Grechanik et al., 2009]. The

point here is not to make a contribution to GUI testing techniques per se, but

Chapter 7. Automating System Integration Tests for Distributed Systems 141

nodes = {

tracker = 121

{ config, pkgs, ... }:

{

environment.systemPackages = [ pkgs.transmission pkgs.bittorrent ];

services.httpd.enable = true;

services.httpd.documentRoot = "/tmp";

};

router = 122

{ config, pkgs, ... }:

{

environment.systemPackages = [ iptables miniupnpd ]; 123

virtualisation.vlans = [ 1 2 ];

};

client1 = 124

{ config, pkgs, nodes, ... }:

{

environment.systemPackages = [ transmission ];

virtualisation.vlans = [ 2 ];

networking.defaultGateway = nodes.router.config.networking.ifaces.eth2.ipAddress;

};

client2 = 125

{ config, pkgs, ... }:

{

environment.systemPackages = [ transmission ];

};

Figure 7.3 Network speciﬁcation for the Transmission regression test

to show that we can easily set up the infrastructure needed for such tests. In

the test script, we can run any desired automated GUI testing tool.

The virtual machines can talk to each other because they are connected

together into a virtual network. Each VM has a network interface eth1 with

an IP address in the private range 192.168.1.n assigned in sequential order by

makeTest. (Recall that each VM also has a network interface eth0 to communi-

cate with the host.) QEMU propagates any packet sent on this interface to all

other VMs in the same virtual network. The machines are assigned hostnames

equal to the corresponding attribute name in the model, so the machine built

from the server attribute has hostname server.

7.3.2 Complex topologies

The Quake test has a trivial network topology: all machines are on the same

virtual network. The test of the port forwarding feature in the Transmis-

sion Bittorrent client (described in Section 7.1) requires a more complicated

topology: an “inside” network, representing a typical home network behind

a router, and an “outside” network, representing the Internet. The router

should be connected to both networks and provide Network Address Trans-

142

testScript = ’’

# Enable NAT on the router and start miniupnpd.

$router→succeed(

"iptables -t nat -F", ...

"miniupnpd -f ${miniupnpdConf}");

# Create the torrent and start the tracker.

$tracker→succeed(

"cp

${file} /tmp/test",

"transmissioncli -n /tmp/test /tmp/test.torrent",

"bittorrent-tracker --port 6969 &");

$tracker→waitForOpenPort(6969);

# Start the initial seeder.

my $pid = $tracker→background(

"transmissioncli /tmp/test.torrent -w /tmp");

# Download from the first (NATted) client.

$client1→succeed("transmissioncli http://tracker/test.torrent -w /tmp &");

$client1→waitForFile("/tmp/test");

# Bring down the initial seeder.

$tracker→succeed("kill -9 $pid");

# Now download from the second client.

$client2→succeed("transmissioncli http://tracker/test.torrent -w /tmp &");

$client2→waitForFile("/tmp/test");

’’;

Figure 7.4 Test script for the Transmission regression test

lation (NAT) from the inside network to the outside. Machines on the inside

should not be directly reachable from the outside. Thus, we cannot do this

with a single virtual network. To support such scenarios, makeTest can create

an arbitrary number of virtual networks, and allows each machine speciﬁca-

tion to declare in the option virtualisation.vlans to what networks they should be

connected.

Figure 7.3 shows the speciﬁcation of the machines for the Transmission test.

It has two virtual networks, identiﬁed as 1 (the “outside” network) and 2 (the

“inside”):

122 The router machine is connected to both networks.

124 The client1 machine is connected to network 2.

125

The client2 machine is connected to network 1 (If virtualisation.vlans is omit-

ted, it defaults to 1.)

121 The tracker runs the Apache web server to make torrent ﬁles available

to the clients.

123 The conﬁguration further speciﬁes what packages should be installed on

what machines, e.g., the router needs the iptables and miniupnpd packages

for its NAT and UPnP-IGD functionality.

The test, shown in Figure 7.4, proceeds as follows. We ﬁrst initialise NAT

on the router. We then create a torrent ﬁle on the tracker and start the tracker

Chapter 7. Automating System Integration Tests for Distributed Systems 143

program, a central Bittorrent component that keeps track of the clients that

are sharing a given ﬁle, on port 6969. Also on the tracker we start the initial

seeder, a client that provides the initial copy of the ﬁle so that other clients

can obtain it. We then start a download on the client behind the router and

wait until it ﬁnishes. If Transmission and miniupnpd work correctly in concert,

the router should now have opened a port forwarding that allows the second

client to connect to the ﬁrst client. To verify that this is the case, we shut down

the initial seeder and start a download on the second client. This download

can only succeed if the ﬁrst client is reachable through the NAT router.

Each virtual network is implemented as a separate QEMU network; thus a

VM cannot send packets to a network to which it is not connected. Machines

are assigned IP addresses 192.168.n.m, where n is the number of the network

and m is the number of the machine, and have Ethernet interfaces connected

to the requested networks. For example, the router will have interfaces eth1

with IP address 192.168.1.3 and eth2 with address 192.168.2.3, while the ﬁrst

client will only have an interface eth1 with IP address 192.168.2.1. The test

infrastructure provides operations to simulate events such as network outages

or machine crashes.

7.4 S E RV I C E - O R I E N T E D T E S T S

The techniques described in the previous section provide a distributed test-

ing framework on infrastructure level. These testing facilities can be used in

conjunction with Disnix to automatically test service-oriented systems.

7.4.1 Writing test speciﬁcations

Figure 7.5 shows a speciﬁcation that tests the web services implementation of

the StaffTracker system, described earlier in Chapter 5:

126 The local attribute src deﬁnes the location to the source tarball of the

StaffTracker system, containing the source code and deployment expres-

sions. This attribute is used in various function invocations in this spec-

iﬁcation.

127

The build attribute is bound to the disnixos.buildManifest function invoca-

tion, which builds the Disnix manifest ﬁle, using the given services,

network and distribution speciﬁcations included with the source code.

This command-line tool: disnix-manifest has the same purpose.

128 The tests attribute is bound to the disnixos.disnixTest function invocation,

which builds and launches a network of virtual machines and deploys

the StaffTracker system, built previously by the function bound to the build

attribute with Disnix. Furhermore, it performs a number of speciﬁed

testcases in this virtual network.

129 Likewise, the testScript speciﬁes Perl code performing test cases in the

virtual network. Here, we ﬁrst wait until the web application front-

144

{ nixpkgs ? <nixpkgs>, nixos ? <nixos> }:

let

pkgs = import nixpkgs { inherit system; };

disnixos = import "${pkgs.disnixos}/share/disnixos/testing.nix" {

inherit nixpkgs nixos;

};

src = /home/sander/WebServices-0.1.tar.gz; 126

rec {

build = disnixos.buildManifest { 127

name = "WebServices-0.1";

version = builtins.readFile ./version;

inherit src;

servicesFile = "deployment/DistributedDeployment/services.nix";

networkFile = "deployment/DistributedDeployment/network.nix";

distributionFile = "deployment/DistributedDeployment/distribution.nix";

};

tests = disnixos.disnixTest { 128

name = "WebServices-0.1";

inherit src;

manifest = build;

networkFile = "deployment/DistributedDeployment/network.nix";

testScript =

129

’’

# Wait until the front-end application is deployed

$test2→waitForFile("/var/tomcat/webapps/StaffTracker/stafftable.jsp");

# Capture the output of the entry page

my $result =

$test3→mustSucceed("curl --fail http://test2:8080/StaffTracker/stafftable.jsp");

# The entry page should contain my name :-)

if ($result =~ /Sander/) {

print "Entry page contains Sander!\n";

}

else {

die "Entry page should contain Sander!\n";

}

# Start Firefox and take a screenshot

$test3→mustSucceed("firefox http://test2:8080/StaffTracker &");

$test3→waitForWindow(qr/Firefox/);

$test3→mustSucceed("sleep 30");

$test3→screenshot("screen");

’’;

};

Figure 7.5 stafftracker.nix: A Disnix test expression implementing test cases for the

StaffTracker web services example

end is deployed, then we check whether the entry package is accessible

showing staff members of a university department and checks whether

my name is listed in it. Then Firefox is launched to make a screenshot

of the entry page.

By running the following command-line instruction, the test suite can be

executed non-interactively:

$ nix-build stafftracker.nix -A tests

Chapter 7. Automating System Integration Tests for Distributed Systems 145

Figure 7.6 Using disnixos-vm-env to deploy the StaffTracker in a network of three

virtual machines

Apart from running the tests from a command-line instruction, the expression

can also be executed in Hydra.

7.4.2 Deploying a virtual network

In addition to running non-interactive test suites, the DisnixOS tool suite also

provides the disnixos-vm-env tool. The usage of disnixos-vm-env is identical to

disnixos-env described in Chapter 6:

$ disnixos-vm-env -s services.nix -n network.nix \

-d distribution.nix

This command-line invocation generates a virtual network resembling the

machines deﬁned in the network speciﬁcation, then launches the virtual ma-

chines interactively and ﬁnally deploys the service-oriented system using Dis-

nix. Figure 7.6 shows what the result may look like if the system is deployed

in a virtual network consisting of three virtual machines.

Figure 7.7 shows the architecture of the disnixos-vm-env tool. First, the net-

work model is used by a NixOS tool called nixos-build-vms, which builds a net-

work of virtual machine conﬁgurations closely matching the conﬁgurations

described in the network model.

Then the disnixos-geninfra tool is used to convert a network model into an

infrastructure model, which the basic Disnix toolset understands. The main

difference between the infrastructure model and the network model is that the

infrastructure model is an implicit model: it should reﬂect the system conﬁg-

urations of the machines in the network, but the system administrator must

146

Figure 7.7 Architecture of the disnixos-vm-env tool

make sure that the model matches the actual conﬁguration. Moreover, it only

has to capture attributes required for deploying the services. The network

model is an explicit model, because it describes the complete conﬁguration of

a system and is used to deploy a complete system from this speciﬁcation. If

building a system conﬁguration from this speciﬁcation succeeds, we know

that the speciﬁcation matches the given conﬁguration.

After generating the infrastructure model, the disnix-manifest tool is invoked,

which generates a manifest ﬁle for the given conﬁguration. Finally, the nixos-

test-driver is started, which launches the virtual machines for testing.

7.5 D I S C U S S I O N

7.5.1 Declarative model

To what extent do we need the properties of Nix and NixOS, in particular the

fact that an entire operating system environment is built from source from a

speciﬁcation in a single formalism, and the purely functional nature of the

Nix store? There are many tools to automate deployment of machines. For

instance, Red Hat’s Kickstart tool installs RPM-based Linux systems from a

textual speciﬁcation and can be used to create virtual machines automatically,

with a single command-line invocation [Red Hat, Inc., 2009].

However, there are many limitations to such tools:

• Having a single formalism that describes the construction of an entire

network from source makes hard things easy, such as building part of

the system with coverage analysis. In a tool such as Kickstart, the bi-

Chapter 7. Automating System Integration Tests for Distributed Systems 147

nary software packages are a given; we cannot easily modify the build

processes of those packages.

• For testing in virtual machines, it is important that VMs can be built

efﬁciently. With Nix, this is the case because the VM can use the host’s

Nix store. With other package managers, that is not an option because

the host ﬁlesystem may not contain the (versions of) packages that a VM

needs. One would also need to be root to install packages on the host,

making any such approach undesirable for automated test suites.

• For automatic testing, one needs a formalism to describe the desired

conﬁgurations. In NixOS this is already given: it is what users use to de-

scribe regular system conﬁgurations. In conventional Unix systems, the

conﬁguration is a result of many “unmanaged” modiﬁcations to system

conﬁguration ﬁles (e.g. in /etc). Thus, given an existing Unix system, it is

hard to distill the “logical” conﬁguration of a system (i.e., speciﬁcation

in ter ms of high-level requirements) from the multitude of conﬁguration

ﬁles.

7.5.2 Operating system generality

The network speciﬁcations described in this chapter build upon NixOS: they

build NixOS operating system instances. This obviously limits the general-

ity of our current implementation: a test that must run on a Windows ma-

chine cannot be accommodated. In this sense, it shows an “ideal” situation,

in which entire networks of machines can be built from a purely functional

speciﬁcation. Nixpkgs does contain functions to build virtual machines for

Linux distributions based on the RPM or Apt package managers, such as Fe-

dora and Ubuntu. These can be supported in network speciﬁcations, though

they would be harder to conﬁgure since they are not declarative. On the other

hand, platforms such as Windows that lack ﬁne-grained package manage-

ment mechanisms are difﬁcult to support in a useful manner. Such platforms

would require pre-conﬁgured virtual images, which are inﬂexible.

The Nix package manager itself is portable across a variety of operating

systems, and the generation of virtual machines works on any Linux host

machine (and probably other operating systems supported by QEMU). The

fact that the guest OS is NixOS is usually ﬁne for automated regression test

suites, since many test cases do not care about the speciﬁc type of guest Linux

distribution.

7.5.3 Test tool generality

Our approach can support any fully non-interactive test tool that can build and

run on Linux. Since Nix derivations run non-interactively, tools that require

user intervention (e.g., interactive GUI testing tools) are not supported. Like-

wise, only systems with a fully automated build process are supported.

148

Figure 7.8 Part of the distributed code coverage analysis report for the Subversion

web service

7.5.4 Distributed coverage analysis

Declarative speciﬁcations of networks and associated test suites make it easy

to perform distributed code coverage analysis. Again, we make no contributions

to the technique of coverage analysis itself; we improve its deployability. First,

the abstraction facilities of the Nix expression language make it easy to specify

that parts of the dependency graph of a large system are to be compiled with

coverage instrumentation (or any other for m of build-time instrumentation

one might want to apply). Second, by collecting coverage data from every

machine in a test run of a virtual network, we get more complete coverage

information. Consider for instance, a typical conﬁguration of the Subversion

revision control system: clients run the Subversion client software, while a

server runs a Subversion module plugged into the Apache web server to pro-

vide remote access to repositories through the WebDAV protocol. These are

both built from the Subversion code base. If a client performs a checkout from

a server, different paths in the Subversion code will be exercised on the client

than on the server. The coverage data on both machines should be combined

to get a full picture.

We can add coverage instrumentation to a package using the conﬁguration

value nixpkgs.conﬁg.packageOverrides. This is a function that takes the original

contents of the Nix Packages collection as an argument, and returns a set of

replacement packages:

nixpkgs.config.packageOverrides = pkgs: {

subversion = pkgs.subversion.override {

stdenv = pkgs.addCoverageInstrumentation pkgs.stdenv;

};

The original Subversion package, pkgs.subversion, contains a function, override,

that allows the original dependencies of the package to be overriden. In this

case, we pass a modiﬁed version of the standard build environment (stdenv)

that automatically adds the ﬂag --coverage to every invocation of the GNU C

Compiler. This causes GCC to instrument object code to collect coverage data

and write it to disk. Most C or C++-based packages can be instrumented in

this way, including the Linux kernel.

The test script automatically collects the coverage data from each machine

in the virtual network at the conclusion of the test, and writes it to $out. The

function makeReport then combines the coverage data from each virtual ma-

chine and uses the lcov tool [Larson et al., 2003] to make a set of HTML pages

Chapter 7. Automating System Integration Tests for Distributed Systems 149

showing a coverage report and each source ﬁle decorated with the line cover-

age. For example, we have built a regression test for the Subversion example

with coverage instrumentation on Apache, Subversion, Apr, Apr-util and the

Linux kernel. Figure 7.8 shows a small part of the distributed coverage anal-

ysis report resulting from the test suite run. The line and function coverage

statistics combine the coverage from each of the four machines in the network.

One application of distributed coverage analysis is to determine code cov-

erage of large systems, such as entire Linux distributions, on system-level tests

(rather than unit tests at the level of individual packages). This is useful for

system integrators, such as Linux distributors, as it reveals the extent to which

test suites exercise system features. For instance, the full version of the cov-

erage report in Figure 7.8 readily shows which kernel and Apache modules

are executed by the tests, often at a very speciﬁc level: e.g., the ext2 ﬁlesys-

tem does not get executed at all, while ext3 is used, except for its extended

attributes feature.

7.5.5 Continuous builds

The ability to build and execute a test with complex dependencies is very

valuable for continuous integration. A continuous integration tool (e.g. Cruise-

Control) continuously checks out the latest source code of a project, builds it,

runs tests, and produces a report [Fowler and Foemmel, 2005]. A problem

with the management of such tools is to ensure that all the dependencies

of the build and the test are available on the continuous build system (e.g., a

database server to test a web application). In the worst case, the administrator

of the continuous build machines must install such dependencies manually.

By contrast, the single command

$ nix-build subversion.nix -A report

causes Nix to build or download everything needed to produce coverage

report for the Subversion web service test: the Linux kernel, QEMU, the

C compiler, the C library, Apache, the coverage analysis tools, and so on.

This automation makes it easy to stick such tests in a continuous build sys-

tem. In fact, there is a Nix-based continuous build system, Hydra (http:

//hydra.nixos.org), that continuously checks out Nix expressions de-

scribing build tasks from a revision control systems, builds them, and makes

the output available through a web interface.

7.6 E VA L U AT I O N

We have created a number of tests

using the virtual machine-based testing

techniques described in this chapter. These are primarily used as regression

tests for NixOS: every time a NixOS developer commits a change to NixOS

The outputs of these tests can be found at http://hydra.nixos.org/jobset/nixos/

trunk/jobstatus. The Nix expressions are at https://svn.nixos.org/repos/nix/

nixos/trunk/tests.

150

or Nixpkgs, our continuous integration system rebuilds the tests, if necessary.

The tests are the following:

• Several single-machine tests, e.g. the OpenSSH test, and a test for the

KDE Plasma desktop environment that builds a NixOS machine and

veriﬁes that a user can successfully log into KDE and start several ap-

plications.

• A two-machine test of an Apache-based Subversion service, which per-

forms HTTP requests from a client machine to create repositories and

user accounts on the server through the web interface, and executes Sub-

version commands to check out from and commit to repositories. It is

built with coverage instrumentation to perform a distributed coverage

analysis.

• A four-machine test of Trac, a software project management service [Edge-

wall Software, 2009] involving a PostgreSQL database, an NFS ﬁle server,

a web server and a client.

• A four-machine test of a load-balancing front-end (reverse proxy) Apache

server that sits in front of two back-end Apache servers, along with a

client machine. It uses test primitives that simulate network outages to

verify that the proxy continues to work correctly if one of the back-ends

stops responding.

• A three-machine variant of the Quake 3 test in Figure 7.2.

• The four-machine Transmission test in Figure 7.3.

• Several tests of the NixOS installation CD. An ISO-9660 image of the

installation CD is generated and used to automatically install NixOS

on an empty virtual hard disk. The function that performs this test is

parametrised with test script fragments that partition and format the

hard disk. This allows many different installation scenarios (e.g., “XFS

on top of LVM2 on top of RAID 5 with a separate /boot partition”) to be

expressed concisely.

The installation test is a distributed test, because the NixOS installa-

tion CD is not self-contained: during installation, it downloads sources

and binaries for packages selected by the user from the Internet, mostly

from the NixOS distribution server at http://nixos.org/. Thus, the test con-

ﬁguration contains a web server that simulates nixos.org by serving the

required ﬁles.

• A three-machine test of NFS ﬁle locking semantics in the Linux kernel,

e.g., whether NFS locks are properly maintained across server crashes.

(This test is slow because the NFS protocol requires a 90-second grace

period after a server restart.)

Chapter 7. Automating System Integration Tests for Distributed Systems 151

Test # VMs Duration (s) Memory (MiB)

empty 1 45.9 166

openssh 1 53.7 267

kde4 1 140.4 433

subversion 2 104.8 329

trac 4 159.4 756

proxy 4 65.4 477

quake3 3 80.6 528

transmission 4 89.5 457

installation 2 302.7 751

nfs 3 259.7 358

Table 7.1 Test resource consumption

For a continuous test to be effective, it must be timely: the interval between

the commit and the completion of the test must be reasonably short. Table 7.1

shows the execution time and memory consumption for the tests listed above,

averaged over ﬁve runs. As a baseline, the test empty starts a single machine

and shuts down immediately.

The execution time is the elapsed wall time on an idle 4-core Intel Core

i5 750 host system with 6 GiB of RAM running 64-bit NixOS. The memory

consumption is the peak additional memory use compared to the idle system.

(The host kernel caches were cleared before each test run by executing echo

3 > /proc/sys/vm/drop_caches.) All VMs were conﬁgured with 384 MiB of RAM,

though due to KVM’s para-virtualised “balloon” driver the VMs typically use

less host memory than that. The host kernel was conﬁgured to use KVM’s

same-page merging feature, which lets it replace identical copies of memory

pages with a single copy, signiﬁcantly reducing host memory usage. (See

e.g. [Miło´s et al., 2009] for a description of this approach.)

Table 7.1 shows that the tests are fast enough to execute from a continuous

build system. Many optimisations are possible, however: for instance, VMs

with identical ker nels and initial ramdisks could be started from a shared,

pre-computed snapshot.

7.7 R E L AT E D W O R K

Most work on deployment of distributed systems takes place in the context of

system administration research. Cfengine [Burgess, 1995] maintains systems

on the basis of declarative speciﬁcations of actions to be performed on each

(class of) machine. Stork [Cappos et al., 2007] is a package management sys-

tem used to deploy virtual machines in the PlanetLab testbed. These and most

other deployment tools have convergent models [Traugott and Brown, 2002],

meaning that due to statefulness, the actual conﬁguration of a system after

an upgrade may not match the intended conﬁguration. By contrast, NixOS’s

purely functional model ensures congruent behaviour: apart from mutable

state, the system conﬁguration always matches the speciﬁcation.

152

Virtualisation does not necessarily make deployment easier; apart from

simplifying hardware management, it may make it harder, since without

proper deployment tools, it simply leads to more machines to be managed

[Reimer et al., 2008].

MLN [Begnum, 2006], a tool for managing large networks of VMs, has

a declarative language to specify arbitrary network topologies. It does not

manage the contents of VMs beyond a templating mechanism.

Our VM testing approach currently is only appropriate for relatively small

virtual networks. This is usually sufﬁcient for regression testing of typical

bugs, since they can generally be reproduced in a small conﬁguration. It

is not appropriate for scalability testing or network experiments involving

thousands of nodes, since all VMs are executed in the same derivation and

therefore on the same host. However, depending on the level of virtualisation

required for a test, it is possible to use virtualisation techniques that scale to

hundreds of nodes on a single machine [Hibler et al., 2008].

There is a growing body of research on testing of distributed systems;

see [Rutherford et al., 2008, Section 5.4] for an overview. However, deploy-

ment and management of test environments appear to be a somewhat ne-

glected issue. An exception is Weevil [Wang et al., 2005], a tool for the de-

ployment and execution of experiments in testbeds such as PlanetLab. We are

not aware of tools to support the synthesis of VMs in automatic regression

tests as part of the build processes of software packages.

During unit testing, environmental dependencies such as databases are

often simulated using test stubs or mock objects [Freeman et al., 2004]. These

are sometimes used due to the difﬁculty of having a “real” implementation

of the simulated functionality. Generally, however, stubs and mocks allow

more ﬁne-grained control over interactions than would be feasible with real

implementations, e.g., when testing against I/O errors that are hard to trigger

under real conditions.

7.8 C O N C L U S I O N

In this chapter, we have shown a method for synthesizing virtual machines

from declarative speciﬁcations to perform integration or system tests. This

allows such tests to be easily automated, an essential property for regression

testing. It enables developers to write integration tests for their software that

would otherwise require a great deal of manual conﬁguration, and would

likely not be done at all.

Chapter 7. Automating System Integration Tests for Distributed Systems 153

154

Part IV

General Concepts

155

Atomic Upgrading of Distributed Systems

A B S T R A C T

In this thesis, we have implemented two tools to perform reliable, repro-

ducible and efﬁcient distributed deployment and upgrade tasks. Disnix is a

distributed deployment tool working on service level and DisnixOS provides

complementary deployment support on infrastructure level. A key aspect in

making their upgrades reliable is by making the upgrade process as atomic

as possible. We either want to have the current deployment state or the new

deployment state, but never an inconsistent mix of the two. Furthermore, it is

also desirable that end-users are not able to notice that the system is temporar-

ily inconsistent while an upgrade is performed. In this chapter, we describe

how the Nix package manager achieves atomic upgrades and rollbacks on

local machines and we explore how we can expand these concepts to dis-

tributed systems. We have implemented a trivial case study to demonstrate

the concepts.

8.1 I N T R O D U C T I O N

A distributed system can be described as a collection of independent com-

puters that appear to a user as one logical system [Nadiminti et al., 2006].

Processes in a distributed system work togethe r to reach a common goal by

exchanging data with each other.

A major problem of upgrading distributed systems is integrity – if we want

to change the deployment state of a distributed system to a new state, for

instance by adding, removing or replacing components on computers or mi-

grating components from one computer to another, we do not want to end up

with a system in an inconsistent state, i.e., a mix of the old and new conﬁgura-

tions. Thus, if we upgrade a distributed system, there should never be a point

in time where a new version of a service on one computer can talk to an old

version of a service on another computer.

To guarantee the correctness of the deployment we need some notion of a

transactional, atomic commit similar to database systems. That is, the transi-

tion from the current deployment state to the new deployment state should be

performed completely or not at all, and it should not be observable half-way

through.

In this thesis, we have described two tools that provide a solution for this

problem, namely Disnix covered in Chapter 4, which is designed for deploy-

ment on service level and DisnixOS (using charon) covered in Chapter 6, de-

signed for deployment on infrastructure level. In this chapter, we explain how

157

Nix performs local upgrades and we explore how to expand this feature to

support atomic upgrades of distributed systems.

8.2 ATO M I C U P G R A D I N G I N N I X

As we have explained earlier in Section 2.2, the Nix package manager is ca-

pable of performing atomic upgrades. Atomic upgrades are possible, because

of a number of important Nix characteristics.

8.2.1 Isolation

The ﬁrst important aspect is that Nix isolates component variants using unique

ﬁle names, such as /nix/store/pz3g9yq2x2ql...-ﬁrefox-12.0. As we have seen, the

string pz3g9yq2x2ql... is a 160-bit cryptographic hash of the inputs to the build

process of the package: its source code, the build script, dependencies such

as the C compiler, etc. This scheme ensures that different versions or variant

builds of a package do not interfere with each other in the ﬁle system. For

instance, a change to the source code (e.g., due to a new version) or to a de-

pendency (e.g., using another version of the C compiler) will cause a change

to the hash, and so the resulting package will end up under a different path

name in the Nix store.

Furthermore, components never change after have been successfully built.

There is no situation where the deployment of a Nix package overwrites or

removes ﬁles belonging to another component. When building the same com-

ponent with different parameters, we always get a new package with a new

unique hash code.

If the build procedure of a component fails for a particular reason, it is

marked as invalid and removed if the Nix garbage collector is invoked.

8.2.2 Proﬁles

The second important aspect is the way Nix proﬁles are implemented. As

described earlier, Nix proﬁles are symlink trees, in which symlinks refer to

components in the Nix store and its primary use case is to make it easier for

end users to access installed Nix packages.

For example, if an end-user requests to install Mozilla Firefox in his Nix

proﬁle, e.g.:

$ nix-env -i firefox

Then the Nix package manager generates a new Nix proﬁle, which is a

component residing in the Nix store containing symlinks to the contents of

all packages installed in the users’ Nix proﬁle. The generated Nix proﬁle has

the same contents as the previous generation, with symlinks to the contents

of the Mozilla Firefox package added.

After generating the Nix proﬁle component, the symlink referring to the

active Nix proﬁle generation is ﬂipped so that it points to the newly generated

158

/home/sander/.nix-proﬁle

/nix/var/nix/proﬁles

default

default-23

default-24

/nix/store

q2ad12h4alac...-user-env

bin

ﬁrefox

z13318h6pl1z...-user-env

pz3g9yq2x2ql...-ﬁrefox-12.0

bin

ﬁrefox

6cgraa4mxbg1...-glibc-2.13

lib

libc.so.6

Figure 8.1 A Nix proﬁle conﬁguration scenario

Nix proﬁle, making the new set of installed packages available to the end-

user. Because ﬂipping symlinks is an atomic operation in UNIX, systems can

be upgraded atomically.

If any of these previous steps fails, then the installation procedure stops

immediately. Because Nix store components installed in an older Nix proﬁle

can safely coexist next to components belonging to a newer Nix proﬁle and

because ﬁles are never overwritten or automatically removed, a system is still

in its previous deployment state if the upgrade procedure suddenly stops.

Furthermore, because components belonging to the new conﬁguration are

not part of the current active Nix proﬁle, they are considered garbage and are

automatically removed when the Nix garbage collector is invoked.

Figure 8.1 shows an example of a hierarchy of symlinks, capturing a de-

ployment scenario after Mozilla Firefox has been installed in a users’ Nix

proﬁle. /home/sander/.nix-proﬁle is a symlink residing in my home directory. By

adding /home/sander/.nix-proﬁle/bin to the PATH environment variable, programs

can be conveniently accessed by typing them on the command-line.

The symlink in my home directory points to /nix/var/nix/proﬁles/default repre-

senting the active version of the default proﬁle. The default proﬁle symlink

points to a generation symlink that is currently used. The generation symlink

refers to a Nix store component containing symlinks that refer to the contents

of the packages installed by the user, including Mozilla Firefox.

If the user wants to undo the operation, it can perform a rollback by run-

ning:

$ nix-env -rollback

The rollback operation is trivial. To undo the previous deployment op-

eration, only the /nix/var/nix/proﬁles/default symlink has to be ﬂipped, so that it

points to default-23 instead of default-24. This operation is an atomic operation.

Chapter 8. Atomic Upgrading of Distributed Systems 159

Apart from ordinary Nix user proﬁles, the same concepts are also applied

in NixOS. As explained earlier in Section 2.3, the NixOS system proﬁle, con-

sisting of all static parts of a Linux distribution and activation scripts which

take care of the dynamic parts of a system, are managed the same way as

Nix user proﬁles and can also be atomically upgraded on a single machine by

ﬂipping symlinks.

8.2.3 Activation

In NixOS, merely ﬂipping symlinks is not enough to fully upgrade a system.

After the static parts of a system have been successfully upgraded, also the

impure parts of a system must be updated, such as the contents of the /var

directory and several other dynamic parts of the ﬁlesystem. Furthermore,

obsolete system services must be stopped and new system services must be

started. This process is imperative and cannot be done atomically.

If the activation process of a new NixOS conﬁguration fails, we can do a

rollback of the NixOS proﬁle and then activate the previous NixOS conﬁgu-

ration. This brings the NixOS conﬁguration in the previous deployment state

that should still work.

8.3 D I S T R I B U T E D AT O M I C U P G R A D I N G

Nix supports local atomic upgrades for static parts of a system. As we have

explained, atomic upgrading is based on the fact that components can be

isolated and do not interfere with each other and the fact that symlinks can

be atomically ﬂipped in UNIX. In this section, we explore how we can expand

these concepts to distributed systems.

8.3.1 Two phase commit protocol

The two phase commit protocol is a distributed algorithm that coordinates all

the processes that participate in a distributed atomic transaction [Skeen and

Stonebraker, 1987]. When processing transactions in database systems, typi-

cally a number of tasks are performed until the commit point is reached, then

either a commit operation is executed, which makes the changes permanent or

an abort, performing a rollback so that none of its effects persists.

The two phase commit protocol implements distributed transactions by

dividing the two phases of a transaction over nodes in the network and by

turning it into an agreement problem. The transaction is initiated by a node

called the coordinator. The nodes participating in the transaction are called the

cohorts. The two phase commit protocol is implemented roughly as follows:

First, the commit-request phase is executed:

• The coordinator sends a query to commit message to all cohorts and waits

until it has received a reply from all cohorts. Typically, cohorts ensure

that no other processes can interfere with the transaction.

160

• Every node executes the necessary steps to reach the commit point. A

cohort replies with a yes message if all their steps have been successfully

executed, otherwise it replies with a no message.

Second, the commit phase is executed:

• If all cohorts have responded with a yes message, the coordinator sends

a commit message to all cohorts. If any cohort has responded with a no

message then all the cohorts receive an abort instruction.

• Cohorts receiving the commit message perform the commit operation and

cohorts receiving the abort message perform the abort operation.

• After the previous operations have been performed, an acknowledgement

message is sent to the coordinator.

The protocol achieves its goal in many cases, even when a temporary sys-

tem failure occurs involving a process or communication link and is thus

widely utilised. The protocol is not completely resilient to all failures, and

in rare cases user intervention is needed to remedy an outcome. The major

disadvantage is that the protocol is a blocking protocol. If the coordinator fails

permanently, then the cohorts participating in a transaction may be blocked

forever. There are a number of improved algorithms, such as the three-phase

commit [Skeen and Stonebraker, 1987] that remedies the blocking issue with

time-outs. Furthermore, as opposed to local atomicity, distributed atomicity is

not a single operation, but a collection of operations on all nodes participating

in a transaction.

8.3.2 Mapping Nix deployment operations

As we have seen, the two phase commit protocol can be used to implement

distributed transactions by splitting the phases of local transactions and turn

them into an agreement problem. An interesting question is whether we can

use this with Nix primitives to support distributed atomic upgrades?

By looking at how the actual deployment processes of Disnix and Charon

are implemented, this mapping is quite trivial. The commit-request phase in

Disnix and Charon (which we call distribution phase in our tools) basically

consist of the following steps:

• Building components

• Transferring component closures

As we have seen, components built by the Nix package manager are iso-

lated and never interfere with other components. If all components have been

successfully built, all the component closures are transferred to the target

machines in the network. This process is also safe as all dependencies are

guaranteed to be included and never interfere with components belonging to

the current activate conﬁguration.

Chapter 8. Atomic Upgrading of Distributed Systems 161

If any of these steps fail, nothing needs to be done to undo the changes, as

components part of the old conﬁguration still exist and the new conﬁguration

has not been activated yet. Components belonging to the new conﬁgura-

tion are considered garbage by the Nix garbage collector. The commit phase

(which we call transition phase in Disnix and Charon) consists of the following

steps:

• Flipping Nix proﬁles

• Deactivating obsolete components and activating new components

The ﬁrst step ﬂips the symlinks of the Nix proﬁles making the new con-

ﬁguration the active conﬁguration, which is an atomic operation, as we have

explained earlier. After the static conﬁgurations have become active, the Dis-

nix activation scripts or NixOS activation scripts are executed, taking care of

activating the dynamic parts of a system, which is an impure operation. Dur-

ing this phase, an end-user may observe that a system is changing, e.g. certain

parts are temporarily inaccessible.

If any of these steps fails, the coordinator has to initialise a rollback by

ﬂipping the symlinks to the previous Nix proﬁle generations. Furthermore,

if new services have been activated, then they must be deactivated and the

services of the previous conﬁguration must reactivated again.

An instrument to prevent end-users noticing the temporal inconsistency

time-window is by temporarily blocking or queuing end-user access in the

activation phase. This requires cooperation of the system that is being up-

graded, as certain components must be notiﬁed. Moreover, a proxy is needed

capable of blocking or queuing messages for a speciﬁc connection type.

8.4 E X P E R I E N C E

To evaluate the usefulness of the distributed upgrade algorithm using Nix

primitives and to achieve full atomic distributed upgrades, we have extended

Disnix and we have implemented a trivial example case demonstrating its use.

8.4.1 Supporting notiﬁcations

As we have explained, in order to fully support distributed atomic upgrades,

we must temporarily block or queue connections in the activation phase. In

order to achieve this goal, the service that serves as an entry point to end-

users must be notiﬁed that the activation phase starts and that the activation

stops. In order to notify services, we have deﬁned two additional operations

in the Disnix activation scripts system, next to the activation and deactivation

operations, described in Chapter 4.

The lock operation is called when the activation phase starts. An activation

script can notify a service, so that it can prepare itself to queue connections

or optionally reject the lock, if it is not safe to upgrade. The unlock operation

is called when the activation phase ends, so that a service can be notiﬁed to

stop queuing connections.

162

Figure 8.2 The Hello World example and the example augmented with a TCP proxy

8.4.2 The Hello World example

As an example case, we have implemented a very trivial “Hello world” client-

server application, shown on the left in Figure 8.2. This example case is brieﬂy

mentioned in Chapter 4.

The system is composed of two services: the hello-world-server is a server

application listening on TCP port 5000. The hello-world-client is a client program,

connecting to the server through the given port and waits for user input which

is sent to the server. If the user sends the “hello” message, the server responds

by sending a “Hello world!” response message, which the client displays to

the end user.

An upgrade can be triggered, for example, by implementing a change in

the hello-world-ser ver source code and by running the following command-line

instruction:

$ disnix-env -s services.nix -i infrastructure.nix \

-d distribution.nix

Although the static parts of the system are safely stored next to existing

parts and can be atomically upgraded by ﬂipping symlinks, upgrading has

a few nasty side effects. If a client is connected to the server while upgrad-

ing, then all the clients are disconnected (which also terminates the clients),

because the server is unreachable. Also other clients may want to establish a

connection during the upgrade phase which is impossible, because it may not

have been restarted yet.

8.4.3 Adapting the Hello World example

To counter these side effects and to make an upgrade process fully atomic,

several aspects of our example case must be adapted. Our adapted example,

shown on the right in Figure 8.2 contains a TCP proxy service: disnix-tcp-proxy

between the server and client. The purpose of the TCP proxy is to serve as

a monitor keeping track how many clients are connected and to temporarily

queue TCP connections during the activation phase.

Chapter 8. Atomic Upgrading of Distributed Systems 163

#!/bin/bash -e

export PATH=/nix/store/q23p9z1...-disnix-tcp-proxy/bin:$PATH

case "$1" in

activate)

130

nohup disnix-tcp-proxy 6000 test1 5000 /tmp/disnix-tcp-proxy-5000.lock > \

/var/log/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).log & pid=$!

echo $pid > /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid

;;

deactivate) 131

kill $(cat /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid)

rm -f /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid

;;

lock) 132

if [ -f /tmp/disnix-tcp-proxy-5000.lock ] 133

then

exit 1

else

touch /tmp/disnix-tcp-proxy-5000.lock

if [ -f /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid ]

then

while [ "$(disnix-tcp-proxy-client)" != "0" ] 134

sleep 1

done

;;

unlock)

135

rm -f /tmp/disnix-tcp-proxy-5000.lock

;;

esac

Figure 8.3 The Disnix activation script for the TCP proxy service

Apart from integrating a proxy into the architecture of the system that must

be deployed, we must also implement a Disnix activation script that notiﬁes

the proxy when the activation phase starts and ends.

Figure 8.3 shows how the Disnix activation script for the TCP proxy mod-

ule is implemented. This is a shell script included with the Disnix TCP proxy

service package and invoked through the Disnix wrapper activation script,

which delegates the activation scripts operations to the bin/wrapper process in-

cluded in the service package:

130 The activate section deﬁnes the activation step of the TCP proxy. Here,

the disnix-tcp-proxy service is started, which listens on TCP port 6000 and

forwards the incoming connections to the hello-server hosted on machine

test1 listening on TCP port 5000. These values are generated by the

Disnix expression that builds the TCP proxy service. The PID of the

TCP proxy is stored in a separate PID ﬁle, which is used to eventually

stop the proxy service.

131 The deactivation section deﬁnes the deactivation step of the TCP proxy,

killing the PID of the TCP proxy process.

164

132 The lock section is invoked before the Disnix transition phase starts.

133 The ﬁrst step in the lock procedure, is to check whether the lock has

been granted to somebody else. If it turns out that a lock has been

granted already, the procedure returns exit status 1, indicating an error.

Otherwise the lock is granted to the requester, by creating a lock ﬁle that

instructs the TCP proxy to queue incoming TCP connections.

134 After locking, a loop is executed, which polls the Disnix TCP proxy

through a Unix domain socket connection to check the status. The disnix-

tcp-proxy-client returns the amount of established connections from the

client to the server. The lock procedure waits until all client connec-

tions are closed (which means that the client should return 0) and then

grants permission. Meanwhile, because the lock ﬁle has been created,

the Disnix TCP proxy queues incoming TCP connections.

135 The unlock section deﬁnes the unlock procedure. Here, the lock ﬁle is

removed so that the disnix-tcp-proxy accepts and forwards incoming con-

nections again.

When running the following command-line instruction, with the augmented

architecture, we can achieve fully atomic distributed upgrades:

$ disnix-env -s services-with-proxy.nix -i infrastructure.nix \

-d distribution-with-proxy.nix

When running the procedure, all phases are executed exactly the same as

the variant without the proxy until the activation phase has been reached.

If there are any clients connected to the server, then Disnix waits until all

clients have been disconnected. Furthermore, new connections are automati-

cally queued.

If all clients have been disconnected, Disnix executes the activation phase

in which new incoming client connections are queued. If the activation phase

ends, then the proxy gets notiﬁed and forwards the incoming connections to

the new server component or to the old server component if the upgrade has

failed.

8.4.4 Other examples

So far we have described a very trivial example case to achieve fully atomic

distributed upgrades. We have also experimented with applying the same

techniques to several use cases described in Chapter 4, such as the StaffTracker

system, which uses a web application front-end instead of a command-line

utility for end-user access.

For the latter example case, merely using a proxy is not enough to achieve

full atomic upgrades. After upgrading a web application, users may still run

old versions of a web application in their browsers, which are not automati-

cally refreshed and may lead to several problems. Furthermore, web applica-

tions typically have state attached to a user session, which may get lost after

Chapter 8. Atomic Upgrading of Distributed Systems 165

an upgrade. State may have to be upgraded as well so that it works properly

with the new version of the web application. In order to properly upgrade

web applications, developers have to adapt web applications in such a way

that they can be notiﬁed of upgrade events so that they refresh themselves

automatically and that they can migrate their state properly.

Another difﬁculty is supporting distributed systems in which processes

communicate through stateless protocols, such as UDP. For these applications,

it is non-trivial to determine the amount of active connections and to decide

whether it is safe to enter the activation phase.

8.5 D I S C U S S I O N

What we have seen in this chapter is that creating a distributed upgrade pro-

tocol using Nix primitives is a trivial process. The non-trivial part is the last

step in which the transition from the old conﬁguration to the new conﬁg-

uration is made ﬁnal, by activating obsolete services and by activating new

services. This phase is imperative and may bring the system temporarily in

an inconsistent state observable by end-users.

Fortunately, the window in which the system is inconsistent is reduced to a

minimum – most steps such as installing and removing services can be done

without interference as Disnix allows components to be safely deployed next

to each other. To completely hide the side-effect, we have developed a simple

TCP proxy and prototype example case that temporarily queues incoming

connections.

To achieve full atomicity, we require cooperation of the system that is being

upgraded, which typically require developers to implement certain features

and changes, which are not always trivial. Proxies need to be added capable

of determining when it safe to upgrade and which can temporarily queue

connections. Furthermore, other components may have to be notiﬁed and

may also need to upgrade their state.

The upgrade protocol also suffers the same weaknesses as the original two-

phase commit protocol. For example, if the coordinator node crashes during

the execution then the cohorts may be locked forever, until a system admin-

istration manually intervenes. Fortunately, in our deployment variant, we

only have to lock cohorts in the commit phase as the previous steps can be

safely executed without affecting an existing conﬁguration. Furthermore, as

with the original two phase commit protocol, the distributed commit step is a

series of commit actions as opposed to a single operation, which brings a sys-

tem in an inconsistent state if the process gets interrupted and is not properly

recovered.

Moreover, the time window in which the system is inconsistent can also be

improved by implementing certain upgrading techniques in the system that

is deployed, such as quiescence [Kramer and Magee, 1990], tranquility [Van-

dewoude et al., 2007] or version-consistency [Ma et al., 2011].

In [Chen et al., 2007], a dynamic update system called POLUS is described

that runs multiple versions of a process simultaneously and redirects function

166

calls from the old version to the new version using the debugging API. More-

over, it can also recover tainted state by using callback functions. The system

is not used in a distributed environment however. In our approach, we only

run one instance of a particular component and the proxy queues incoming

connections, while the transition process is in progress. However, it may be

possible to make the proxy more sophisticated to support redirection from

old versions to new versions of two instances running simultaneously.

In this chapter, we have only explored how to achieve full atomic upgrades

with Disnix on service level. It is also possible to implement similar tech-

niques to support fully atomic infrastructure upgrades on infrastructure level,

but so far this has not been done yet.

8.6 C O N C L U S I O N

In this chapter, we have described the Nix upgrade property and how to

map these primitives to the two-phase commit protocol to achieve atomic dis-

tributed deployment upgrades. As a case study, we have developed a trivial

client-server example case communicating through TCP sockets and we have

extended Disnix.

Although we have achieved our goal with our trivial TCP example, achiev-

ing fully atomic distributed upgrades in general is difﬁcult. The protocol

suffers the same weakness as the two-phase commit protocol and the ap-

proach requires cooperation of the system being upgraded to make upgrades

fully atomic (e.g. implementing techniques such as a proxy), although the

time window in which inconsistency occurs has been reduced to a minimum,

since we can install packages safely next to each other without affecting an

existing conﬁguration. To achieve full atomicity, it is required that developers

make several modiﬁcations to the system that is being upgraded, which is not

always trivial.

Chapter 8. Atomic Upgrading of Distributed Systems 167

168

Deploying Mutable Components

A B S T R A C T

In the previous chapters, we have shown Nix, a purely functional package

manager and several extensions, such as Disnix, capable of deploying service-

oriented systems in a network of machines. As we have seen, Nix and its

applications only support deployment of immutable components, which never

change after they have been built. Components are made immutable because

of the underlying purely functional model in order to ensure purity and repro-

ducibility. However, not all components of software systems can be isolated in

a store of immutable objects, such as databases. Currently, these components

must be deployed by other means, i.e. in an ad-hoc manner, which makes

deployment and upgrades of such systems difﬁcult, especially in large net-

works. In this chapter, we analyse the properties of mutable components and

we propose Dysnomia, a deployment extension for mutable components.

9.1 I N T R O D U C T I O N

Throughout this thesis we have seen that for many reasons software deploy-

ment processes have become very complicated and that this complexity is

proportional to the number of machines in a network. Moreover, upgrading

systems is even more challenging and not always yield the same results as a

clean installation.

In this thesis, we have described several tools providing automated, reliable

and reproducible deployment automation for several deployment aspects. We

have also seen in this thesis, that the fundamental basis of these tools is the

Nix package manager, a package manager borrowing concepts from purely

functional programming languages, such as Haskell. Nix provides a number

of distinct features compared to conventional deployment tools, such as a

domain-speciﬁc language to make builds reproducible by removing many

side effects, a Nix store which safely stores multiple versions and variants of

components next to each other, atomic upgrades and rollbacks, and a garbage

collector which safely removes components which are no longer in use.

In the purely functional deployment model of Nix, all components are

stored as immutable ﬁle system objects in the Nix store, which never change

after they have been built. Making components immutable ensures purity

and reproducibility of builds. However, not all components of a system can

be managed in such a model, because they may change at run-time, such

as databases. Currently, Nix-related tools manage mutable state in an ad-hoc

manner and outside the Nix store, e.g. by creating scripts which initialise

a database at ﬁrst startup. As a consequence, system administrators or de-

169

rec {

stdenv = ...

fetchurl = ...

gtk = stdenv.mkDerivation {

name = "gtk-2.24.1";

src = ...

};

firefox = stdenv.mkDerivation {

name = "firefox-9.0.1";

src = fetchurl {

url = http://../firefox-9.0.1-source.tar.bz2;

md5 = "195f6406b980dce9d60449b0074cdf92";

};

buildInputs = [ gtk ... ];

buildCommand = ’’

tar xfvj $src

cd firefox-

./configure --prefix=$out

make

make install

’’;

};

...

}

Figure 9.1 A partial Nix expression

velopers who deploy systems using Nix-related tools still need to manually

deploy and upgrade mutable components, which can be very tedious in large

networks of machines.

In this chapter, we explore the properties of mutable components and we

propose Dysnomia, a generic deployment extension using concepts close to

Nix, to manage the deployment and upgrades of mutable components. This

tool is designed to be used in conjunction with Disnix and other Nix-related

tools.

9.2 T H E N I X D E P L O Y M E N T S Y S T E M

As we have explained earlier in Chapter 2, Nix is designed to automate the

process of installing, upgrading, conﬁguring and removing software packages

on a system. Nix offers a number of distinct features to make deployment

more reliable.

We have seen that one of its key traits is that Nix stores packages in isolation

in a Nix store. Every component in the Nix store has a special ﬁlename, such

as: /nix/store/324pq1...-ﬁrefox-9.0.1 in which the former part: 324pq1... represents

a cryptographic hash-code derived from all build-time dependencies. If, for

example, Firefox is build with a different version of GCC or linked against

a different version of the GTK+ library, the hash code will differ and thus it

is safe to store multiple variants next to each other, because components do

not share the same name. After components have been built, they are made

immutable by removing the write permission bits.

170

Nix packages are derived from declarative speciﬁcations deﬁned in the

purely functional Nix expression language. An example of a partial Nix ex-

pression is shown in Figure 9.1, containing an attribute set in which each

attribute refers to a function building the component from source code and

its dependencies. For instance, the ﬁrefox attribute value refers to a function

call which builds Mozilla Firefox from source code and uses GTK+ as a build

time dependency, which is automatically appended to the PATH environment

variable of the builder. The output of the build is produced in a unique direc-

tory in the Nix store, which may be: /nix/store/324pq1...-ﬁrefox-9.0.1.

While performing a build of a Nix package, Nix tries to remove many

side effects of the build process. For example, all environment variables are

cleared or changed to non-existing values, such as the PATH environment vari-

able. We have also patched various essential build tools such as gcc in such a

way that they ignore impure global directories, such as /usr/include and /usr/lib.

Optionally, builds can be performed in a chroot environment for even better

guarantees of purity.

Because Nix removes as many side effects as possible, dependencies can

only can be found by explicitly setting search directories, such as PATH to Nix

store paths. Undeclared dependencies, which may reside in global directories

such as /usr/lib cannot accidentally let a build succeed.

The purely functional deployment model of Nix has a number of advan-

tages compared to conventional deployment tools. Because most side-effects

are removed, a build function always gives the same result regardless on what

machine the build is performed, giving a very high degree of reproducibility.

Furthermore, we can efﬁciently upgrade systems, because components with

the same path, which are already present do not have to be rebuilt or redown-

loaded. Upgrades of systems are also always safe, because newer versions and

variants of components can be safely coexist with components belonging to

an older conﬁguration. Finally, there is never a situation in which a package

has ﬁles belonging to an old and new version at the same time.

9.3 D I S N I X

As explained in Chapter 4, Disnix extends Nix by providing additional models

and tools to manage the deployment of service-oriented systems in a network

of machines. Disnix uses a services model to deﬁne the distributable compo-

nents available, how to build them, their dependencies and their types used

for activation. An infrastructure model is used to describe what machines are

available and their relevant properties. The distribution model maps services

deﬁned in the services model to machines deﬁned in the infrastructure model.

Figure 9.2 shows an example of a service model, describing two distributable

components (or services). testDatabase is a MySQL database storing records.

testWebApplication is a Java web application which depends on the MySQL

database.

Each service deﬁnes a number of attributes. The name attribute refers to its

canonical name, dependsOn deﬁnes the dependencies of the service on other

Chapter 9. Deploying Mutable Components 171

{pkgs, system, distribution}:

rec {

testDatabase = {

name = "TestDatabase";

type = "mysql-database";

pkg = pkgs.testDatabase;

};

testWebApplication = {

name = "TestWebApplication";

dependsOn = {

inherit testDatabase;

};

pkg = pkgs.testWebApplication;

type = "tomcat-webapplication";

};

}

Figure 9.2 A partial Disnix services model

services (which may be physically located on another machine) and pkg de-

ﬁnes how the service must be built (not shown here). The type attribute is

used for the activation of the components by invoking a script attached to the

given type. For example, the tomcat-webapplication type indicates that the given

component should be deployed in an Apache Tomcat container. The mysql-

database type is used to deploy an initial MySQL database from a given SQL

ﬁle containing table deﬁnitions. Disnix supports a number of predeﬁned acti-

vation types as well as an extension interface, which can be used to implement

a custom activation type.

By running the disnix-env command with the three models as command-

line parameters, the complete deployment process of a service-oriented system

is performed, which consists of two phases. In the distribution phase, com-

ponents are built from source code (including all their dependencies) and

transferred to the target machines in the network. Because Nix components

can be safely stored next to existing components of older conﬁgurations, we

can safely perform this procedure without affecting an existing system con-

ﬁguration. After this phase has succeeded, Disnix enters the transition phase,

in which obsolete components of the old conﬁguration are deactivated and

new components are activated in the right order. We have seen in the previ-

ous chapter that end-user connections can be blocked/queued in this phase,

so that end-users will not notice that the system is temporarily inconsistent, a

requirement to achieve full atomicity.

In case of a change in a model, an upgrade is performed in which only

necessary parts of the system are rebuilt and reactivated. Disnix determines

the differences between conﬁgurations by looking at the hash code parts of

Nix store components. If they are the same in both conﬁgurations, they do

not have to be deployed again.

172

9.4 M U TA B L E C O M P O N E N T S

The components managed by Nix and Disnix are immutable. Mutable com-

ponents, such as databases, cannot be managed in such a model, apart from

the initial deployment step from a static schema. If a component in the Nix

store would change, we can no longer guarantee that a deterministic result

is achieved when it is used during a build of another component, because

hash codes may no longer uniquely represent a build result. Apart from the

fact that mutable components change over time, they have a number of other

distinct characteristics compared to immutable components managed in the

Nix store.

First, many mutable component types are hosted inside a container [Dearle,

2007], such as a database management system or an application server. In

order to deploy mutable components, ﬁle system operations are typically not

always enough; also the state of the container must be adapted. For example,

to deploy a database component, the DBMS must be instructed to create a

new database with the right authorisation credentials, in order to deploy a

Java web application on a Servlet container, the Servlet container must be

notiﬁed.

Second, state is often stored on the ﬁle system in a format which is close

to the underlying implementation of the container and the given platform.

These ﬁles may not be portable and they may have to reside in an unisolated

directory reserved for the container. For example, it may be impossible to

move database state ﬁles from a 32-bit instance of a container to a 64-bit in-

stance of a container, or to transfer state data from a little endian machine, to

a big endian machine. Furthermore, state ﬁles may also contain locks or may

be temporarily inconsistent, because of unﬁnished operations.

Because of these limitations, many containers have utilities to dump their

state data in a portable format (e.g. mysqldump) and in a consistent manner

(e.g. single transaction) and can reload these dumps back into memory, if

desired. Because such operations may be expensive, many containers can

also create incremental dumps. In this chapter, we refer to the actual ﬁles

used by a container as physical state. A dump in a portable format is a logical

representation of a physical state.

9.5 D Y S N O M I A

Nix and Disnix provide automated, reliable deployment of static parts of soft-

ware systems and networks of software systems. In this section, we explore

how we can manage mutable software components in conjunction with Nix-

related tools, in an extension which we call Dysnomia.

9.5.1 Managing mutable components

Deployment operations of Nix components are typically achieved by perform-

ing operations on the ﬁlesystem. As mentioned in Section 9.4, deployment

Chapter 9. Deploying Mutable Components 173

of mutable components cannot be typically done by ﬁlesystem operations.

Therefore, we must interact with tooling capable of activating and deacti-

vating mutable components and we need to interact with tools capable of

generating snapshots representing the logical representation of the state. Be-

cause these tools are different for each component type, we need to provide

an abstraction layer, which can be used to uniformly access all mutable com-

ponents.

To deploy mutable components, Dysnomia deﬁnes an interface with the

following operations:

• activate. Activates the given mutable component in the container. Op-

tionally, a user can provide a dump of its initial state.

• deactivate. Deactivates the given mutable component in the container.

• snapshot. Snapshots the current physical state into a dump in a portable

format.

• incremental-snapshot. Creates an incremental snapshot in a portable for-

mat.

• restore. Brings the mutable component in a speciﬁed state by restoring a

dump.

The actual semantics of the operations depend on the mutable component

type. For example, the activate operation for a MySQL database implies creat-

ing a database and importing an initial schema, whereas the activate operation

for a Subversion repository means creating a Subversion repository. The snap-

shot operation of a MySQL database invokes mysqldump which stores the cur-

rent state of database in a SQL ﬁle and the snapshot operation for a Subversion

repository invokes svnadmin dump to create a Subversion dump.

9.5.2 Identifying mutable components

Nix and Nix applications use unique names of components to make deploy-

ment reproducible and to determine how to efﬁciently upgrade systems. For

example, if a particular component with a hash code already exists, we do not

have to build it again because its properties are the same.

In order to reproduce state in conjunction with static parts of the system,

we need to snapshot, transfer and restore the logical representation of state of

mutable components and we must be capable of distinguishing between these

snapshots based on their names so that we can decide whether we have to

deploy a new generation or whether it can remain untouched. Unfortunately,

we cannot use a hash of build-time dependencies to uniquely address mutable

component generations.

To uniquely identify logical state of mutable components several properties

are important. First, we need to know the name of the mutable component,

which refers to the database name or Subversion repository name. We need

174

Distribution phase:

• Building components

• Transferring component closures

Transition phase:

• Flipping Nix proﬁles

• Request lock on services

• Deactivating obsolete components and activating new components

• Request unlock on services

Figure 9.3 Original Disnix deployment procedure

to know in which container it is hosted, which we can identify by a con-

tainer name and an optional identiﬁer (if there are more instances of the same

container type). Third, we need an attribute to make a distinction between

generations, which must be derived from the container that is being used. For

example, a generation number can be a MySQL binary log sequence num-

ber or a Subversion revision number. Also a time stamp could be used if a

container does not have facilities to make a distinction between generations,

although this may have practical implications, such as synchronisation issues.

For example, the logical state of a particular Subversion revision can be iden-

tiﬁed by: /dysnomia/svn/default/disnix/34124.

9.5.3 Extending Disnix

We have replaced the Disnix activation module system by Dysnomia and ex-

tended the deployment procedure of disnix-env described in Section 8.3.2, to

include deployment steps for mutable components. The original Disnix de-

ployment procedure is described in Figure 9.3.

To provide support for the deployment of mutable components, we have

to implement several changes to the Disnix deployment procedure. In the

transition phase, in which access from end users can be temporarily blocked,

we ﬁrst determine the names of the logical state snapshots of mutable compo-

nents. Then we create dumps of all mutable components which have not been

snapshotted yet (which becomes obvious by checking whether a dump with

the same ﬁlename already exists) and we transfer these dumps to the coordi-

nator machine. After deactivating obsolete services in the old conﬁguration

and activating new services in the new conﬁgurations, the state for the muta-

ble components is transferred from the coordinator to their new locations in

the network and activated. Finally, end user access is restored. The updated

deployment procedure is described in Figure 9.4.

Chapter 9. Deploying Mutable Components 175

Distribution phase:

• Building components

• Transferring component closures

Transition phase:

• Flipping Nix proﬁles

• Request lock on services

• Snapshot and transfer mutable component states

• Deactivating obsolete components and activating new components

• Import mutable component snapshots

• Request unlock on services

Figure 9.4 Disnix deployment procedure with state transfer

Because creating snapshots of mutable components may be expensive (es-

pecially for large databases) and because access to end users may be blocked,

this procedure may give a large time window in which the system inaccessi-

ble, which is not always desirable. Therefore, we have also implemented an

optimised version of this procedure using incremental snapshots (for mutable

components capable of doing this). In this improved version, we create dumps

of mutable components before the transition phase, allowing end users to still

access the system. After the dumps have been transferred to their new loca-

tions, we enter the transition phase (in which end-user access is blocked) and

we create and transfer incremental dumps, taking the changes into account

which have been made while performing the upgrade process. In this proce-

dure we still have downtime, but this time window is reduced to a minimum.

The optimised deployment procedure is shown in Figure 9.5.

9.6 E X P E R I E N C E

We have developed Dysnomia modules for a number of mutable component

types, which we have listed in Table 9.1. Each column represents a separate

aspect of the module, such as the container for which it has been designed,

the semantics of the activation, deactivation, snapshot and restore operations

and which attribute is used to make a distinction in naming the logical state

snapshots. Most of the modules are designed for deployment of databases,

which instruct the DBMS to create or drop a database and create snapshots

and restore snapshots using a database administration tool. Additionally, we

have implemented support for Subversion repositories which use a revision

number to distinguish between generations. Ejabberd uses a database for

176

Distribution phase:

• Building components

• Transferring component closures

Transition phase:

• Snapshot and transfer mutable component states

• Flipping Nix proﬁles

• Request lock on services

• Snapshot and transfer incremental mutable component states

• Deactivating obsolete components and activating new components

• Import full and incremental mutable component snapshots

• Request unlock on services

Figure 9.5 Optimised Disnix deployment procedure with state transfer

storing user account settings, but has no support for incremental backups and

a timestamp is the only way to make a distinction between generations. We

have also implemented support for several other types of components, such

as web applications and generic UNIX processes. These components typically

do not change, but they still need to be activated in a container, which state

must be modiﬁed.

We have applied Dysnomia in conjunction with Disnix to the case studies

covered in Chapter 5, including the StaffTracker toy systems, ViewVC (http:

//viewvc.tigris.org) and SDS2 – the industrial case study from Philips

Research. We have performed various redeployment scenarios, in which the

state of mutable components had to be captured and transferred. For the

Philips case study, redeployment took a long time in certain scenarios as the

size of a number of databases was quite large (the total size of the entire

system is 599 MiB). In worst case, a simple upgrade may take almost as much

time as a clean installation which is about 270.5 seconds if the data set is

unchanged

9.7 R E L AT E D W O R K

Many deployment tools support deployment of mutable state in some degree,

although they have no consensus about mutable components. Conventional

package managers such as RPM, modify state of parts of a system by using

Although this absolute number is still acceptable for nowadays’ standards, it is proportional

to the amount of data, which is not acceptable. The absolute number is not that high, because the

SDS2 data set is still relatively small.

Chapter 9. Deploying Mutable Components 177

maintainer scripts invoked during the installation of a package. The purpose

of maintainer scripts is to "glue" ﬁles from a package to ﬁles already residing

on the system. The disadvantage is that they imperatively modify parts of

the system, which cannot be trivially undone, and the operations depend on

the current state of the system. As a consequence, they may produce different

results on other machines. The Mancoosi project investigates how to provide

transactional upgrades of packages with maintainer scripts, by means of a

system model and DSL language to implement maintainer scripts [Cicchetti

et al., 2010]. By simulating whether an upgrade may succeed, they can be

made transactional. To perform rollbacks, inverse operations can be derived.

Also other model-driven deployment frameworks such as Cfengine [Burgess,

1995] (and derivatives) and the IBM Alpine project [Eilam et al., 2006] are ca-

pable of dealing with mutable state, although in an imperative and ad-hoc

manner.

An earlier attempt of managing mutable state with Nix has been done

in [den Breejen, 2008] in which a Copy-On-Write (COW) ﬁlesystem is used to

capture mutable state of containers in constant time. Although this is more

efﬁcient, only the physical state can be captured, with all its disadvantages,

such as the fact that it may be inconsistent and non portable. Also, the state

of a container is captured as a whole, while it may also be desirable to treat

parts of the container state as separately deployable mutable components (e.g.

databases).

9.8 D I S C U S S I O N

Dysnomia provides a generic manner for deploying mutable components and

abstracts over physical and logical state. The major disadvantage of this ap-

proach is the use of the ﬁlesystem as an intermediate layer. For certain types of

mutable components this works acceptably. For large databases, a distributed

deployment scenario can be very costly, because it may take a long time to

write a snapshot to a disk, transfer the snapshot and to restore it again, even

for incremental dumps. Many modern database management systems have

replication features, which can perform these tasks more efﬁciently. However,

database replication is difﬁcult to integrate in the Nix deployment model.

9.9 C O N C L U S I O N

In this chapter, we have analysed issues with deployment and upgrades of

mutable components in conjunction with Nix and we have described Dys-

nomia, an extension for deployment of mutable components. With Dysno-

mia, mutable components can be conveniently deployed in a generic manner.

A drawback is that redeployment may be very costly for certain component

types, such as very large databases.

To counter the high costs of creating snapshots and state transfers, LBFS

[Muthitacharoen et al., 2001] – A Low-bandwidth Network File System – may

178

be worth investigating. As a ﬁnal note, Dysnomia has not yet been applied in

conjunction with NixOS and Charon.

Chapter 9. Deploying Mutable Components 179

Type Container Activation Deactivation Snapshot Restore Ordering

mysql-database MySQL Create database Drop database mysqldump mysql import Binlog sequence

postgresql-database PostgreSQL Create database Drop database pg_dump psql import Binlog sequence

subversion-repository Subversion Create repository Drop repository svnadmin dump svnadmin load SVN revision

ejabbard-database Ejabberd server Init database Drop database ejabberdctl backup ejabberdctl restore Timestamps

tomcat-webapplication Apache Tomcat Symlink directory Remove symlink - - -

apache-webapplication Apache HTTPD Symlink WAR ﬁle Remove symlink - - -

process init/Upstart Symlink job Remove symlink - - -

Table 9.1 A collection of Dysnomia types and their properties

180

Discovering License Constraints using

Dynamic Analysis of Build Processes

A B S T R A C T

With the current proliferation of open source software components, intellec-

tual property in general, and copyright law in particular, has become a critical

non-functional requirement for software systems. A key problem in license

compliance engineering is that the legal constraints on a product depend on

the licenses of all sources and other artifacts used to build it. The huge size of

typical dependency graphs makes it infeasible to determine these constraints

manually, while mistakes can expose software distributors to litigation. In

this chapter we show a generic method to reverse-engineer this information

from the build processes of software products by tracing system calls (e.g.,

open) to determine the composition graph of sources and binaries involved in

build processes. Results from an exploratory case study of seven open source

systems, which allowed us to discover a licensing problem in a widely used

open source package, suggest our method is highly effective.

10.1 I N T R O D U C T I O N

Today’s software developers often make use of reusable software components,

libraries, and frameworks. This reuse can be observed in diverse software

systems, from tiny to massive (e.g., LibreOfﬁce), as well as in industrial, gov-

ernment, and academic systems. Reuse can greatly improve development

efﬁciency, and arguably improves quality of delivered systems.

With reuse of software the question under what license a binary should be

released becomes very important. If code is written entirely by a developer,

then the developer is free to choose whatever license he wants to. However,

when reusing existing third party components in a binary, the ﬁnal license or

licenses (if any [Hemel et al., 2011; German and Hassan, 2009]) that govern

this binary is determined by the licenses under which these components were

released, plus the way these components are combined. Failing to comply

with license terms could end in costly lawsuits from copyright holders.

To correctly determine the license of an executable, we should know three

things:

• How are source ﬁles combined into a ﬁnal executable (e.g. static linking,

dynamic linking)?

• What licenses govern the (re)use of source ﬁles that were used to build

the binary?

181

• How can we derive the license of the resulting binary from the build

graph containing the source ﬁles?

Our goal in this chapter is to provide an answer for the ﬁrst question. Iden-

tifying what goes into an executable is not straightforward. An executable is

created by using other executables (such as compiler or code generators), and

by transforming source code and other binary components (such as libraries)

into the executable.

The process of creating a binary is usually driven by a build tool, such as

make, cmake, ant, etc. and it is documented in a build recipe, such as a Makeﬁle.

To complicate identifying the binary provenance of an executable, the build

process might also use a set of conﬁguration parameters (“conﬁgure ﬂags”) to

enable or disable speciﬁc features that will affect what is ultimately included

in the executable. Thus, the way in which a binary is built may change the licensing

constraints on that binary.

The necessary information cannot, in general, be derived by analysing

build recipes such as Makeﬁles, because they tend to have incomplete de-

pendency information. Therefore, we reverse-engineer the build process of the

binary by deriving its build dependency graph by tracing system calls made

by processes during the building of a binary. These system calls reveal which

ﬁles were read and written by each process, and thus allows a dependency

graph to be constructed.

The structure of this chapter is as follows. In Section 10.2, we motivate the

problem of determining licenses of binaries and give a real-world example of

a library whose license it determined by its build process. In Section 10.3, we

argue that static analysis of build processes is infeasible, and that therefore

we need a dynamic approach based on tracing of actions performed during

a build. We explain our system call tracing method in Section 10.4. To deter-

mine the feasibility of this method we applied it to a number of open source

packages (Section 10.5). This analysis revealed a licensing problem in FFmpeg,

a widely used open source component, which was subsequently conﬁrmed

by its developers. We discuss threats to the validity of our approach in Sec-

tion 10.6. In Section 10.7 we cover related work and Section 10.8 concludes

this chapter.

10.2 A M O T I VAT I N G E X A M P L E

To motivate the need of knowing what goes into a binary we use as an exam-

ple FFmpeg, which is a library to record, convert and stream audio and video,

and has been used in both open source and proprietary video players. FFmpeg

has been frequently involved in legal disputes

Open Source components are available under various licenses, ranging

from simple permissive software licenses (BSD and MIT) allowing develop-

ers to include code in their products without publishing any source code,

to more strict licenses imposing complex requirements on derived works

See http://en.wikipedia.org/wiki/List_of_software_license_violations

182

(GPLv2, GPLv3, MPL). Unfortunately it is not always easy or even possible

to combine components that use different licenses. This problem is known as

license mismatch [German and Hassan, 2009]. Extra care needs to be taken

to make sure that license conditions are properly satisﬁed when assembling a

software product.

Incorrect use of open source software licenses opens developers up to liti-

gation by copyright holders. A failure to comply with the license terms could

void the license and make distribution of the code a copyright violation. In

Germany several cases about non-compliance with the license of the Linux

kernel (D-Link

, Skype

, Fortinet

) were taken successfully to court [ifrOSS,

2009]. In the US there have been similar lawsuits involving BusyBox [Software

Freedom Law Center, 2008].

The license of our motivating example differs depending on the exact ﬁles

the consuming applications depend on. As stated in FFmpeg’s documentation:

Most ﬁles in FFmpeg are under the GNU Lesser General Public License version

2.1 or later (LGPL v2.1+). [...] Some other ﬁles have MIT/X11/BSD-style

licenses. In combination the LGPL v2.1+ applies to FFmpeg.

Some optional parts of FFmpeg are licensed under the GNU General Public

License version 2 or later (GPL v2+). See the ﬁle COPYING.GPLv2 for details.

None of these parts are used by default, you have to explicitly pass --enable-gpl

to conﬁgure to activate them. In this case, FFmpeg’s license changes to GPL

v2+.

This licensing scheme creates a potential risk for developers reusing and

redistributing FFmpeg, who want to be absolutely sure there is no GPLv2+

code in the FFmpeg binary they link to. Evidence for this can be found in the

the FFmpeg forums

The reason why this is important is that if FFmpeg is licensed under the

LGPL, it allows it to be (dynamically) linked against code under any other

license (including proprietary code). But if it is licensed under the GPLv2+,

it would require any code that links to it to be licensed under a GPLv2- or

GPLv3-compatible license.

FFmpeg’s source code is divided into 1427 ﬁles under the LGPLv2.1+, 89

GPLv2+, 18 MIT/X11, 2 public domain, and 7 without a license. To do proper

legal diligence, an important question to answer is: if the library is built to

be licensed under the LGPLv2+ (i.e., without –enable-gpl), is there any ﬁle

under the GPLv2+ used during its creation (and potentially embedded into

the binary)? If there is, that could potentially turn the library GPLv2+.

Given the size of components such as FFmpeg, manual analysis of the entire

code is infeasible. Therefore, we need an automated method to assist in this

process.

2-6 O 224/06 (LG Frankfurt/Main)

7 O 5245/07 (LG München I)

21 O 7240/05 (LG München I)

http://ffmpeg.arrozcru.org/forum/viewtopic.php?f=8&t=821

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

183

patchelf: patchelf.o

g++ patchelf.o -o patchelf

patchelf.o: patchelf.cc

g++ -c patchelf.cc -o patchelf.o \

-DENABLE_FOO

install: patchelf

install patchelf /usr/bin/

Figure 10.1 A simple Makeﬁle

10.3 G E N E R A L A P P R O A C H : T R A C I N G B U I L D P R O C E S S E S

As we saw in the previous section, it is important to be able to identify what

source ﬁles were involved in building a binary, and how they were composed

to form the binary.

10.3.1 Reverse-Engineering the Dependency Graph

Our goal therefore is to reverse-engineer the build dependency graph of a binary,

which should describe how the source ﬁles were combined to form the ﬁnal

binary. Formally, the dependency graph is deﬁned as G = (V, E), where

V = V

∪V

, the union of task nodes V

and ﬁle nodes V

. A task node v

∈ V

denotes a step in the build process. (The granularity of a task depends on the

level of detail of the reverse-engineering method; they could be Make actions,

Unix processes, Ant task invocations, and so on.) It is deﬁned as the ordered

tuple hid, . . . i, where id ∈ Ids is a symbol uniquely identifying the task, plus

auxiliary information such as the name of the program(s) involved in the task

or the name of the Makeﬁle rule. A ﬁle node v

∈ V

is a tuple hid, pathi,

where id ∈ Ids is a symbol uniquely identifying the ﬁle node or the special

value e, and path is the ﬁle’s path

So how do we reverse-engineer the dependency graph for an existing prod-

uct? It might seem that this is simply a matter of inspecting the build code

of the product, that is, the scripts and speciﬁcation ﬁles responsible for au-

tomatically constructing the installable binary artifacts from the source code.

For instance, for a product built using Make [Feldman, 1979], we could just

study the Makeﬁle. As a simple running example, consider PatchELF, a small

utility for manipulating Unix executables

. It consists of a single C++ source

ﬁle, patchelf.cc. Building this package installs a single binary in /usr/bin/patchelf.

Suppose that we want to ﬁnd out what source ﬁles contributed to the build

of this binary. To do this, we might look at its Makeﬁle, shown (simpliﬁed) in

Figure 10.1. If we only use the information included in the Makeﬁle, follow-

ing the rules in reverse order from the binary patchelf, we conclude that patchelf

We explain in Section 10.4.3 why path does not uniquely identify ﬁle nodes and id is neces-

sary to disambiguate them.

http://nixos.org/patchelf.html

184

/usr/bin/patchelfpatchelf.cc g++

patchelf.o

g++

patchelf

install

Figure 10.2 Hypothetical dependency graph derived from an analysis of the Make-

ﬁle in Figure 10.1. Make actions are depicted as yellow rectangles and ﬁles as

ovals; input ﬁles are blue, and ﬁnal binaries are in green; and temporarily created

ﬁles are in grey.

#ifdef ENABLE_FOO

#include <iostream>

#endif

int main() {

#ifdef ENABLE_FOO

std::cout << "Hello, world!" << std::endl;

#endif

return 0;

}

Figure 10.3 Example source code demonstrating an optional dependency

has a single source ﬁle: patchelf.cc. Figure 10.2 shows the dependency graph

resulting from this analysis of the Makeﬁle.

10.3.2 Why We Cannot Use Static Analysis

In fact, the graph in Figure 10.2 is highly incomplete. Both manual inspection

and automatic static analysis of the build code are unlikely to produce correct,

complete dependency graphs. There are a number of reasons for this:

Makeﬁles are usually not complete

Makeﬁles do not list every dependency required. For instance, the ﬁle patchelf.cc

uses several header ﬁles, such as system header ﬁles from the C++ compiler

and C library, but also a header ﬁle elf.h included in the PatchELF source dis-

tribution (which was actually copied from the GNU C Library). Inspection

of the build ﬁles does not reveal these dependencies; it requires a (language-

dependent) analysis of the sources as well. Similarly, the linker implicitly

links against various standard libraries. Also note that Makeﬁle rules do not

list all of their outputs: the install rule does not declare that it creates a ﬁle in

/usr/bin. Therefore a static analysis of the Makeﬁle will miss many inputs and

outputs.

Build ﬁles are numerous, large, and hard to understand

Large projects can have thousands of build-related ﬁles, which exhibit a sig-

niﬁcant amount of code churn [McIntosh et a l., 2011; de Jonge, 2005]. This

makes manual inspection infeasible; only automated methods are practical.

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

185

Dependencies might be the result of many other build processes

If the binary is created using ﬁles that are not part of the project itself (such

as external dependencies like libraries) this process has to repeated again (re-

cursively) to be able to determine which ﬁles were used to create such ﬁles.

Code generators might make it difﬁcult to follow the build process

For example, a build might use a parser generator (such as yacc). Therefore

one might need to do static analysis of the source for ﬁles of the parser (*.y

ﬁles) and know the type of dependencies required by the generated code. This

is complicated if the source code has ad-hoc code-generators. In such case the

output of them would have to be analysed to determine what is needed to

build them.

Conﬁguration options and compilation ﬂags

Most build-conﬁguration tools (Autoconf/Automake, cmake, qmake, etcetera),

allow customization of the build process by analyzing the local environment

where the system is being built: what libraries are installed, what is their loca-

tion, what features should be enabled and disabled. In some cases, this build-

conﬁguration process might determine which of two alternative libraries will

be used (e.g. one is already installed in the target system). In essence, this

process would require the execution of the build conﬁguration process to de-

termine the actual instance of the Makeﬁle to be used. Such a Makeﬁle might

further contain compilation ﬂags that would require the dynamic analysis of

the creation of the ﬁnal binary. This kind of conditionality is widely used

in open source systems, where the person building a product might enable

or disable various options, usually depending on which libraries are already

installed in the system where the product is expected to run.

For example, assume that the source code of patchelf.cc is as shown in Fig-

ure 10.3, and we attempt static analysis to determine any other dependencies

that it might have. However, the compiler ﬂag -DENABLE_FOO indicates that

the compiler should enable the preprocessor deﬁne ENABLE_FOO, and there-

fore, any static analysis would require that this be taken into consideration.

This directive might include or exclude dependencies; in the example, en-

abling it causes a dependency on the header ﬁle iostream.

There are many build systems

Many different build systems are in use, such as Ant, Maven, countless vari-

ants of GNU Make, language-speciﬁc scriptable formalisms such as Python’s

Setuptools, and code generation layers above other build systems such as

the GNU Autotools and Perl’s MakeMaker. In practice, a static analysis tool

would be limited to a few of them at most; and build systems that use general-

purpose languages (such as Autoconf scripts or Python Setuptools) for speci-

fying build processes are practically impossible to analyse.

186

10.3.3 Dynamic Analysis through Tracing

Given all these issues, the only effective way to determine the source prove-

nance of a binary that we can hope for is to perform dynamic analysis of the

build process. That is, given concrete executions of the build systems that

produce the binary under scrutiny and its dependencies, we wish to instru-

ment those executions to discover the exact set of ﬁles that were used during

the build, and the manner in which they were used (e.g., used by a compiler,

a linker, or during the installation or conﬁguration steps). The result of such

an instrumented build is a trace of ﬁles used to create the binary. This trace

can be annotated with licensing information of each of the nodes, and further

analysed to do license auditing of the binary.

Previous approaches to tracing have instrumented speciﬁc build tools such

as GCC and the Unix linker [Tuunanen et al., 2009] or used debug instrumen-

tation already present in GNU Make [Adams et al., 2007]. These methods are

limited to speciﬁc tools or require correct dependency information in Make-

ﬁles. In the next section, we present a generic method intended to work for

all tools used during a build.

10.4 M E T H O D : T R A C I N G T H E B U I L D P R O C E S S T H R O U G H

S Y S T E M C A L L S

A tracing of the build process seeks to discover the inputs and the outputs of

each step in a build process in order to produce a dependency graph. That

is, the instrumentation of the build process must log which ﬁles are read and

written by each build step, respectively. Therefore, if we can determine all

ﬁle system accesses made by a build process, we will be able to produce the

dependency graph of the binary being built.

Instead of instrumenting each program used during the build process, we

have decided to instrument the kernel. Any well-behave process run by the

kernel is expected to use system calls to open, read, write and close ﬁles. If we

can “listen” to these system calls we could determine, for any build process:

a) the programs being executed, b) the ﬁles being read, and c) the ﬁles being

generated.

10.4.1 Overview

In this section, we describe one implementation of this idea: namely, by log-

ging all system calls made during the build. System calls are calls made by

user processes to the operating system kernel; these include operations such

as creating or opening a ﬁle. Thus, logging these calls allows us to discover

the inputs and outputs of build steps at the granularity of operating system

processes or threads. Thus, in essence, we determine the data ﬂow graph be-

tween the programs invoked during the build. A nice aspect of this approach

is that the necessary instrumentation is already present in most conventional

operating systems, such as Linux, FreeBSD, Mac OS X and Windows.

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

187

A major assumption this approach makes is that if a ﬁle is opened, it is

used to create the binary (either directly–it becomes part of the binary, such

as source code ﬁle) or indirectly (it is used to create the binary–such as con-

ﬁguration ﬁle). Not all ﬁles opened will have an impact in the licensing of

the binary that is created, but we presume that because they might have an

impact, it is necessary to evaluate how each ﬁle is used, and if the speciﬁc use

has any licensing implications for the ﬁnal binary.

Consider, for instance, tracing the system calls made during the execution

of the command make install with the Makeﬁle shown in Figure 10.1. On Linux,

this can be done using the command strace

, which executes a program and

logs all its system calls. Thus, strace -f make install prints on standard error

a log of all system calls issued by make and its child processes. (The ﬂag -f

causes strace to follow fork() system calls, i.e., to trace child processes.) Make

ﬁrst spawns a process running gcc to build patchelf.o. Since this process creates

patchelf.o, it will at some point issue the system call

open("patchelf.o",

O_RDWR|O_CREAT|O_TRUNC, 0666)

Likewise, because it must read the source ﬁle and all included headers, the

trace will show system calls such as

open("patchelf.cc", O_RDONLY)

open("/usr/include/c++/4.5.1/string",

O_RDONLY|O_NOCTTY)

open("elf.h", O_RDONLY)

Continuing the trace of Make, we see that the linker opens patchelf.o for

reading (as well as libraries and object ﬁles such as /usr/lib/libc.so, /usr/lib/libc_-

nonshared.a and /usr/lib/crtn.o) and creates ./patchelf, while ﬁnally the install target

opens ./patchelf for reading and creates /usr/bin/patchelf. Thus we can derive a

dependency graph for the ﬁnal, installed binary /usr/bin/patchelf. The nodes in

this graph are ﬁles and the processes that read, create or modify ﬁles. In the

graph for PatchELF, there will therefore be a path from the nodes patchelf.cc

and elf.h to /usr/bin/patchelf, revealing the dependencies of the latter.

Performing a trace on just one package reveals the direct dependencies of

that package, e.g., ﬁles such as patchelf.cc and /usr/lib/libc_nonshared.a. Since the

latter is a binary ﬁle that gets linked into the patchelf program, for the purposes

of license analysis, we would like to know what source ﬁles were involved

in building libc_nonshared.a. For example, if this static library contains a ﬁle

distributed under a non-permissive copyleft license, then this can have legal

consequences for the patchelf binary.

Therefore, we can perform the same trace analysis on external dependen-

cies of PatchELF, such as Glibc (the package that contains lib_nonshared.a). The

result will be a large dependency graph showing the relationships between

packages. For instance, in this graph, some source ﬁles of Glibc have paths to

/usr/lib/libc_nonshared.a, and from there, to /usr/bin/patchelf.

http://strace.sourceforge.net/

188

PatchELF

sources

PatchELF

trace

Straced

build

Dependency graph

(in SQLite database)

Trace

analyser

License

report

License

analyser

FFmpeg

sources

FFmpeg

trace

Straced

build

Trace

analyser

Glibc

sources

Glibc

trace

Straced

build

Trace

analyser

...

Straced

build

Trace

analyser

Figure 10.4 Overview of the system call tracing approach. Nodes denote artifacts,

and edges denote the tools that create them.

Figure 10.4 summarises our approach. For each project we’re interested in

and its dependencies, we perform a complete build of the package. This can

typically be done fully automatically by using package management systems

such as RPM [Foster-Johnson, 2003]; all we need to do is to trace the package

build with strace. A trace analyser is then applied to each trace to discover

the ﬁles read and written by each process. This information is stored in a

database, which, in essence, contains a dependency graph showing how each

ﬁle in the system was created. This dependency graph can then be used by

license analysis tools (which we do not develop in this chapter). For instance,

for a given binary, such a tool could use the dependency graph to identify the

sources that contributed to the binary, determine each source ﬁle’s license us-

ing a license extraction tool such as Ninka [German et al., 2010], and compute

the license on the binary.

We now discuss in detail how we produce a system call trace, how we

analyse the trace to produce the dependency graph, and how we can use the

dependency graph to answer queries about binaries.

10.4.2 Producing the Trace

Producing a system call trace is straight-forward: on Linux, we essentially just

run strace -f /path/to/builder, where builder is the script or program that builds the

package.

For performance reasons, we only log a subset of system calls. These are

the following:

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

189

• All system calls that take path arguments, e.g., open(), rename() and ex-

ecve().

• System calls related to process management. As we shall see below, it’s

necessary to know the relationships between processes (i.e. what the

parent process of a child is), so we need to trace forks. On Unix sys-

tems, forking is the fundamental operation for creating a new process:

it clones the current process. The child process can then run execve() to

load a different program within its address space; the underlying system

calls responsible for this are clone() and vfork().

10.4.3 Producing the Build Graph

The trace analyser reads the output of strace to reverse-engineer the build de-

pendency graph. This graph is deﬁned as in Section 10.3.1, where each task

represents a process executed during the build. Task nodes are extended to

store information about processes: they are now a tuple hid, program, argsi,

where program is the path of the last program executed by the task (i.e., the

last binary loaded into the process’s address space by execve(), or the pro-

gram inherited from the parent) and args is the sequence of command-line

arguments.

The analyser constructs the graph G by reading the trace, line by line.

When a new process is created, a corresponding task node is added to V

When a process opens a ﬁle for reading, we add a ﬁle node v

to V

(if it

didn’t already exist), and an edge from v

to the task node. Likewise, when

a process opens a ﬁle for writing, we add a ﬁle node v

to V

, and an edge

from the task node to v

patchelf-0.5/bin/patchelf

patchelf-0.5.tar.bz2 tar

stripinstall

patchelf-0.5/bin/patchelf

patchelf

patchelf.o

cc1plus

ccoPovgb.s

elf.h

patchelf.cc

gcc[...]/crtbegin.o

gcc[...]/crtend.o

gcc[...]/(...elided...)

linux-headers[...]/errno-base.h

linux-headers[...]/errno.h

linux-headers[...]/(...elided...)

glibc[...]/include/_G_config.h

glibc[...]/include/alloca.h

glibc[...]/(...elided...)

Figure 10.5 PatchELF dependency graph. External dependencies are depicted in

white.

There are some tricky aspects to system call traces that need to be taken

into account by the analyser:

• System calls can use relative paths. These need to be absolutised with

respect to the current working directory (cwd) of the calling process. As

this requires us to know the cwd of each process, we need to process

chdir() calls. Since the cwd is inherited by child processes, the analyser

must also keep track of parent/child relations, and propagate the cwd of

the parent to the child when it encounters a fork operation. The function

190

absolutise(dir, path) returns the absolute representation of path relative to

dir.

• System calls can be preempted, causing the return value of a system

call to be separated from its arguments. This is not usually a problem:

when we encounter an unﬁnished system call, we could remember the

arguments until we encounter the resumption, where we combine the

arguments with the return value and process the system call normally.

However, for forks, the child may start to execute before the parent has

returned from its fork() call. Thus, to propagate the cwd to the child,

we need to know the new process’s parent before processing any system

calls from the child. This is done in a preprocessing step (not shown)

by replacing any unﬁnished trace line with the combination of that line

and its subsequent resumption, and removing the latter.

• Process IDs (PIDs) can wrap around; in fact, this is quite likely in large

packages because PIDs are by default limited to 32,768 in Linux. There-

fore the analyser needs to distinguish between different processes with

the same PID. For this reason, task IDs are not equal to PIDs; rather, a

unique task ID is generated in the graph every time a process creation

is encountered in the trace.

• A special problem is the representation of ﬁles that are written mul-

tiple times, i.e., are created or opened for writing by several different

processes. This is quite common. For instance, it is usual in the make

install step to remove debug information from binaries by running the

strip command. This means that after linking, the binar y patchelf is ﬁrst

copied from the temporary build directory to /usr/bin/patchelf by the com-

mand install, and then read and recreated by strip. Represented naively,

this gives the following graph:

install /usr/bin/patchelf strip

Such cycles in the graph make the data ﬂow hard to follow, especially if

there are multiple processes that update a ﬁle: the order is impossible

to discern from the graph. Therefore, each modiﬁcation of a ﬁle creates

a new node in the graph to represent the ﬁle, e.g.,

install /usr/bin/patchelf strip /usr/bin/patchelf

These nodes are disambiguated in the graph by tagging them with the

task ID that created them.

Note that this approach allows “dead” writes to a ﬁle to be removed

from the graph, analogous to the discovery of dead variables in data

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

191

ﬂow analysis. For instance, in the build script echo foo > out; echo bar

> out, the second process recreates out without reading the ﬁle created

by the ﬁrst process. Thus, there is no edge in the graph exiting the

ﬁrst out node, and its task is not equal to the creator of the out node,

so it can be safely removed from the graph. This is an instance of the

notion developed in [Dolstra et al., 2004] that many concepts in memory

management in the programming languages domain can be carried over

to the domain of the storage of software on disk.

• We are currently not interested in the dynamic library dependencies of

tools used during the build, because these generally don not have license

implications. For instance, every program executed during the build

opens the GNU C library, libc.so.6 (almost every program running under

Linux uses it), and therefore we are not interested in these. However,

we do care if the linker is the one opening a library when it’s creating a

binary (during a separate, later call to open().) So to simplify the graph,

we omit these dependencies. To detect calls to open() made due to dy-

namic linking, we use the following method: we observed that after the

dynamic linker has opened all shared libraries, it issues an arch_prctl()

system call. Thus, we ignore any opens done between a call to execve()

and arch_prctl().

Let us return to the PatchELF example described in Section 10.3. Building

this package has several automated steps beyond running make install. As with

most Unix packages, building the PatchELF package starts with unpacking

the source code distribution (a ﬁle patchelf-0.5.tar.bz2), running its conﬁgure

script, running make and ﬁnally running make install.

Figure 10.5 shows the dependency graph resulting from analysing the trace

of a build of this package. It is instructive to compare this dependency graph

with the “naive” one in Figure 10.2. The system call trace reveals that com-

piling patchelf.cc causes a dependency not only on elf.h, but also on numerous

header ﬁles from Glibc and the Linux kernel. Likewise, the linker brings in

several object ﬁles and libraries from Glibc and GCC. (Many header and li-

brary dependencies that are detected by system call tracing have been omitted

from the ﬁgure for space reasons.) The analysis also reveals that the compi-

lation of patchelf.cc is actually done by two processes, which communicate

through an intermediate temporary assembler ﬁle, ccoPovgb.s; and that the

installed executable is rewritten by strip.

In our implementation, the analyser stores the resulting graph in a SQLite

database. As sketched in Figure 10.4, other tools can use this database to

answer various queries about a binary. Multiple traces can be dumped into

the same database, creating a (possibly disconnected) graph consisting of the

output ﬁles of each package and the processes and ﬁles that created them.

This allows inter-package dependencies to be analysed.

192

10.4.4 Using the Build Graph

Once we have the full dependency graph for a binary, we can perform various

queries. The most obvious query is to ask for the list of source ﬁles that

contributed to a binary x ∈ V

. Answering this question is not entirely trivial.

It might seem that the sources are the ﬁles in the graph that contributed to

the binary and were not generated themselves, i.e., they are source vertices in

from which there is a path to x:

∈ V

| deg

−

) = 0 ∧∃ a path in G from v

to x

But as Figure 10.5 shows, this does not yield the ﬁles in which we’re in-

terested: the “source ﬁles” returned are source distributions such as patchelf-

0.5.tar.bz2, from which a tar task node produces ﬁles such as patchelf.cc. (Addi-

tional “sources” in this graph are the header ﬁles, object ﬁles and libraries in

Glibc, GCC and the kernel. However, if these packages are traced and added

to the dependency graph, then these ﬁles are no longer sources; instead, we

get more source distributions such as glibc-2.12.2.tar.bz2.) While this answer is

strictly speaking correct – patchelf-0.5.tar.bz2 is the sole source of the package –

it is not very precise: we want to identify speciﬁc ﬁles such as patchelf.cc.

To get the desired results, we simply transform the full dependency graph

into one from which task nodes such as tar and its dependencies have been

removed. For PatchELF, this yields a set

{patchelf.cc, elf.h, errno-base.h, errno.h, . . . }

On this set, we can apply Ninka to determine the licenses governing the

source ﬁles.

More interesting queries can take into account how ﬁles are composed. For

instance, given a binar y, we can ask for all the source ﬁles that are statically

linked into the binary. Here the graph traversal does not follow dynamic li-

brary inputs to ld nodes. It does follow other task nodes, so for instance bison

nodes are traversed into their .y sources.

10.4.5 Coarse-grained Processes

Because system call tracing identiﬁes inputs and outputs at the granularity of

Unix processes, the main assumption underlying this approach to tracing is

that every process is part of at most one conceptual build step. However, this

is not always the case. Consider the command cp foo bar /usr/bin/. This com-

mand installs two ﬁles in /usr/bin. Conceptually, these are two build steps. But

since they are executed in a single process, we get the following dependency

graph:

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

193

foo

/usr/bin/foo

/usr/bin/barbar

This is undesirable, because foo and its sources now appear as dependencies

of /usr/bin/bar, even if the two programs are other wise unrelated. We call these

processes coarse-grained.

In cases such as cp (and similar commands such as install), we can ﬁx this

problem on an ad hoc basis by rewriting the graph as follows. We scan the

dependency graph for cp task nodes (and other tasks of which we know they

may be coarse-grained) that have more than one output, and each output ﬁle

has the same base name as an input ﬁle. We then replace this task node with

a set of task nodes, one for each corresponding pair of inputs and outputs,

e.g.

bar cp /usr/bin/bar

foo cp /usr/bin/foo

Note that both cp tasks in this graph correspond to the same process in the

trace. Since coarse-grained processes are a threat to the validity of our ap-

proach, we investigate how often they occur in the next section.

10.5 E VA L U AT I O N

The goal of this evaluation is to serve as a demonstration of the correctness of

our method, and second, its usefulness.

10.5.1 Correctness

We performed an experiment to determine how often our method produces

false negatives ( RQ1) and false positives (RQ2) on a number of open source

packages, including FFmpeg, patchelf and a number of other open source

packages, including Apache Xalan, a Java based XSLT library using the Apache

Ant build system. A false negative occurs when a ﬁle is a dependency of a bi-

nary (i.e., necessary to build the binary) but does not appear in the graph as

such. Conversely, a false positive occurs when a ﬁle appears as a dependency

of a binary in the graph, but is not actually used to build the binary.

To measure this in an objective manner, we used the following method.

First, for a given binary, we use our tracer to determine the dependency graph.

194

Package Files Files False False Recall Preci-

“unused” “used” nega- posi- sion

tives tives

patchelf 19 3 0 1 1.00 0.67

aterm 60 57 1 0 0.98 1.00

ffmpeg 986 1253 0 7 1.00 0.99

opkg 9 97 1 3 0.99 0.97

openssl 1180 847 0 5 1.00 0.99

bash 806 280 0 34 1.00 0.88

xalan 379 955 0 4 1.00 1.00

Table 10.1 RQ1 & RQ2: False negatives and positives

This gives us a list of source ﬁles from the package’s distribution that were

supposedly used in the build, and a list of source ﬁles that were not. Then:

• To deter mine false negatives (RQ1), for each presumptively unused ﬁle,

we rebuild the package, where we delete the ﬁle after the conﬁgure step.

If the package then fails to build or gives different output, the ﬁle ap-

parently is used, so it is ﬂagged as a false negative. If the package builds

the same, then it is a true negative.

• To determine false positives (RQ2), for each assumed used ﬁle, we like-

wise rebuild the package after deleting the ﬁle. If the package still builds

correctly, then the ﬁle apparently is not used by the build, so it is a false

positive. Otherwise it is a true positive.

We used some optimisations in our experiment. For instance, for RQ1, we can

delete all unused ﬁles at the same time. If the package still builds correctly,

then there are no false negatives.

Table 10.1 shows the results of our experiment to determine false negatives

and positives. For each package in our evaluation, it shows the number of

apparently unused and used ﬁles (i.e. non-dependencies and dependencies),

the number of failed builds (i.e., false negatives), the number of successful

builds (i.e., false negatives), and the recall and precision.

As the table shows, the number of false negatives reported by this method

is very low, and in fact, none of them are really false negatives: the ﬁles are

used during the build, but not to build the binary. In the aterm package, we

had to include a source ﬁle called terms-dict, used by a test case which was not

compiled in the regular build process. Due to strictness of the Makeﬁle an

error was raised. The package opkg requires a ﬁle called libopkg_test.c to build

a test case in the test suite, which was not present in the resulting executable.

The number of reported false positives is also low. Most of these false

positives occurred because a particular ﬁle was not actually used in the build

(e.g. a documentation ﬁle) and the build system did not explicitly check for

its presence (e.g. this could happen by wildcards in a build conﬁguration

ﬁle). For example, for bash the number of false positives was higher than

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

195

the other packages, because it contains a number of documentation ﬁles and

gettext translation ﬁles, which did not cause an error in the build process if

they were removed.

One interesting mistaken false positive report came from patchelf. A false

positive was ﬂagged because the package uses the elf.h header ﬁle, which is

also present in the GNU C Library, so the package compiles successfully even

if it is deleted from patchelf.

10.5.2 Usefulness

In [Germán et al., 2010] we analysed the Fedora 12 distribution to understand

how license auditing is performed. One of the challenges we faced was deter-

mining which ﬁles are used to create a binary. This is particularly important

in systems that include ﬁles under different licenses that do not allow their

composition (for example the same system contains a ﬁle under the BSD-4

clauses license and a ﬁle under the GPLv2, which cannot be legally combined

to form a binary).

From that study we have selected three projects with ﬁles that appear in-

compatible with the rest of the system:

• FFmpeg is a library to record, convert and stream audio and video, which

we have introduced in Section 10.2. This package may be covered under

different licenses depending on which build time options are used.

• cups is a printing system developed by Apple Inc. for Unix. The library

is licensed under a permissive license (the CUPS License). Version 1.4.6

contained 3 ﬁles (backend/*scsi*) under a old BSD-4 clauses license which

is not compatible with the CUPS License. For this system (and the next

two) we would like to know if these ﬁles are being used to build the

binaries of the product in a way that is in conﬂict with their license.

• bash is a Unix command shell written for the GNU project. Recent ver-

sions have been licensed under the GPLv3+, but it contains a ﬁle under

the old BSD-4 clauses (examples/loadables/getconf.c). We analyzed version

4.1.

To determine the license of the ﬁles in the projects we used Ninka. We traced

the build of each system, and identiﬁed the ﬁles used. We then compared the

licenses of these ﬁles, to try to identify any licensing inconsistency.

FFmpeg

By default, FFmpeg is expected to create libraries and binary programs under

the LGPLv2+. However, we discovered that two ﬁles under the GPLv2+ were

being used during its compilation: libpostproc/postprocess.h and x86/gradfun.c.

The former is an include ﬁle and contains only constants. The latter, how-

ever, is source code, and it is used to create the library libavﬁlter. In other

Full-size renderings of graphs can be found at http://www.st.ewi.tudelft.nl/

~dolstra/icse-2012/, with problematic ﬁles showing as red nodes.

196

words, the library libavﬁlter is licensed under the LGPLv2.1+, but required

libavﬁlter/x86/gradfun.c, licensed under the GPLv2+. We manually inspected the

Makeﬁles, which were very complex. It was not trivial to determine if, how,

and why, it was included.

We therefore looked at the library, where we found the symbol correspond-

ing to the function located inside x86/gradfun.c, which gave us conﬁdence that

our method was correct. With this information we emailed the developers

of FFmpeg. They responded within hours. The code in question was an op-

timized version of another function. A developer quickly prepared a patch

to disable its use, and the original authors of x86/gradfun.c were contacted to

arrange relicensing of the ﬁle under the LGPLv2.1+

. Three days later, this

ﬁle’s license was changed to GPLv2+

cups

The dependency graph of cups shows that the three ﬁles in question: backend/(scsi|scsi-

iris|scsi-linux).c were used to create a binary called /backend/scsi. Coincidentally

these ﬁles were recently removed in response to Bug #3509.

bash

The ﬁle in question, examples/loadables/getconf.c is not used while building the

library. This is consistent with the directory where it is located (examples).

Although the goal of this chapter is not to provide a system capable of cor-

rectly investigating license issues, we have shown that the information prov-

ided by the build graph tracer is sufﬁcient to highlight potential licensing

inconsistencies. These inconsistencies could be solved automatically by devel-

oping a complementary license calculus built on top of the build tracer.

10.6 T H R E AT S T O VA L I D I T Y

In this section, we discuss threats to the validity of our work. It is worth noting

that apart from bugs in the kernel or in our analysis tool, system call tracing is

guaranteed to determine all ﬁles accessed during a build, and nothing more.

In that sense there is no risk of false positives or negatives. The real question,

however, is whether it can reconstruct a meaningful dependency graph: a

simple list of ﬁles read and written during a build is not necessarily useful.

For instance, a source ﬁle might be read during a build but not actually used

in the creation of (some of) the outputs.

Thus, with respect to external validity, as noted in Section 10.4.5, the main

threat is that coarse-grained processes (i.e. those that perform many or all

build steps in a single process) yield dependency graphs that do not accu-

rately portray the “logical” dependency graph. Thus, ﬁles may be falsely

reported as dependencies. This can in turn lead to false positives in tools that

use the graph; e.g., in a license analyser, it can cause false warnings of poten-

tial licensing problems. Note however that this is a conservative approach: it

http://web.archiveorange.com/archive/v/4q4BhMcVOjgXyfN3wyhB

http://ffmpeg.org/pipermail/ffmpeg-cvslog/2011-July/039224.html

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

197

may over-estimate the set of source ﬁles used to produce a binary, but that is

safer than accidentally omitting source ﬁles.

Therefore, our approach may not work well with coarse-grained build sys-

tems such as Ant, which tends to perform many build steps internally (al-

though Ant can also be instructed to fork the Java compiler, which circum-

vents this problem). In many cases, users can still get usable dependency

graphs by instructing the build tool to build a single target at a time; we then

get traces that only contain the ﬁles accessed for each particular target. How-

ever, in Unix based environments (where most open source components are

used) most build systems are process based, which can be analysed without

much trouble.

A further threat to external validity would be processes that access ﬁles

unnecessarily. (One might imagine a C compiler that does an open() on all

header ﬁles in its search path.) However, we have not encountered such cases.

Finally, system call tracing is not portable and differs per system. However,

most widely used operating systems support it in some variant.

With respect to the internal validity in our evaluation of RQ1 and RQ2,

where we used ﬁle deletion as an objective measure to verify if a ﬁle is or is

not a dependency, a limitation is that complex build processes can fail even

if ﬁles that are unnecessary to a speciﬁc target are missing. For instance, in a

large project, the build system might build a library in one directory that is

subsequently not used anywhere else.

Construct validity is threatened by limitations in our prototype that can

cause ﬁle accesses to be missed. Most importantly, the trace analyser does not

trace ﬁle descriptors and inter-process communication (e.g. through pipes)

yet. For instance, it fails to see that the task patch in the command cat foo.patch

| patch bar.c has a dependency on foo.patch; it only sees that patch reads and

recreates bar.c. Our prototype also lacks support for some uncommonly-used

ﬁle-related Linux system calls, such as the *at() family of calls. Implementing

the necessary support seems straight-forward.

The conclusion validity of our work is mainly limited by the limited size of

our evaluation. Although the size is limited, many open source packages use

common build tooling and very similar deployment processes, which should

not be too difﬁcult to analyse with our tooling. The main challenge lies in sup-

porting uncommon open source packages, which deviate from these common

patterns.

10.7 R E L AT E D W O R K

Tu and Godfrey argue for a build-time architectural view in their founda-

tional paper [Tu and Godfrey, 2001]. They study three systems, Perl, JNI, and

GCC, to show how build systems can exhibit highly complex architectures.

In this chapter we use the same abstraction level, ﬁles and system processes,

to understand structural and compositional properties of built binaries. Our

dependency graphs are based on Tu and Godfrey’s modelling language for

describing build architectures, but we streamline the language to efﬁciently

198

communicate the information we are interested in. The systems we study

in this exploratory study are much smaller and simpler than those in Tu and

Godfrey, and in this respect, we contribute an interesting corollar y: even “triv-

ial" systems, e.g. patchelf, Figure 10.5, a single binary built from a single source,

can exhibit a surprisingly complex and nuanced build architecture. Complex

builds may be the rule, rather than the exception, even in small systems.

Tuunanen et al. used a tracing approach to discover the dependency graph

for C-based projects as part of ASLA (Automated Software License Analyzer)

[Tuunanen et al., 2009, 2006]. They modiﬁed the C compiler gcc, the linker ld

and the archive builder ar to log information about the actual inputs and out-

puts used during the build. The resulting dependency graph is used by ASLA

to compute license information for binaries and detect potential licensing con-

ﬂicts. However, instrumenting speciﬁc tools has several limitations compared

to system call tracing. First, adding the required instrumentation code to pro-

grams could be time consuming (ﬁnding the “right” places to instrument the

code); second, many tools would need to be instrumented: compilers, linkers,

assemblers, etc. Finally, a package might contain or build a code generator

and use it to generate part of itself. Clearly, such tools cannot be instrumented

in advance.

Adams et al. analyse debug traces of GNU Make to reverse-engineer large

build systems [Adams et al., 2007]. Their study approaches the build from a

developer’s point of view, with an aim to support build-system maintenance,

refactoring, and comprehension. Adams et al. generate and analyse depen-

dency graphs from extracted trace information, similar to us. However, we

must emphasize the different aim of our study: Adams et al. reverse-engineer

the semantic source architecture of the build to help developers gain a global

picture. We reverse-engineer the logical binary architecture to help developers

and other stakeholders answer questions about composition and provenance

of binary artifacts. Adams et al. strive to understand the build system in and

of itself, whereas we exploit the build system to acquire speciﬁc information.

The main use case for this speciﬁc information is copyright analysis of

software systems. Disinterring the myriad ways by which “ﬁxed works" of

original authorship are embedded, compiled, transformed, and blended into

study by German et al. showed this information is important to owners and

resellers who need to understand the copyright of their systems [Germán

et al., 2010]. German et al. studied package meta-data in Fedora to ﬁnd license

problems. In their study they showed that Fedora’s current package-depends-

on-package level of granularity, while helpful, is ultimately insufﬁcient for

understanding license interactions in a large open source compilation such as

Fedora. Only a ﬁle-depends-on-ﬁle level of granularity can work, and this

study directly addresses that problem.

Build Audit [Build Audit, 2004] is a tool to extract and archive useful in-

formation from a build process by tracing system calls. Although it is capa-

ble of showing the invoked processes and involved ﬁles, it cannot show the

interaction between processes, which we need. Similarly, in [Attariyan and

Chapter 10. Discovering License Constraints using Dynamic Analysis of Build

Processes

199

Flinn, 2008; Coetzee et al., 2011] strace or ptrace techniques are used to de-

rive dependencies from system calls for parallelizing builds and to diagnose

conﬁguration bugs.

10.8 C O N C L U S I O N

In this chapter we have proposed reverse-engineering the dependency graph

of binaries by tracing build processes – speciﬁcally, we showed that we can

do so by tracing system calls issued during builds. Our proof-of-concept pro-

totype implementation suggests that system call tracing is a reliable method

to detect dependencies, though it may over-estimate the set of dependencies.

This is the essential foundation upon which we want to build a license analy-

sis tool.

Acknowledgements We wish to thank Tim Engelhardt of JBB Rechtsanwälte

for his assistance, the FFmpeg developers for their feedback, Karl Trygve

Kalleberg for many comments and discussions on license calculi, and Bram

Adams for pointing out related work.

200

Part V

Applications

201

Declarative Deployment of WebDSL

applications

A B S T R A C T

The tools belonging to the reference architecture for distributed software de-

ployment provide automated solutions for distributed service and infrastruc-

ture deployment tasks. In order to deploy a system designed for a particular

domain, developers or system administrators have to write a number of de-

ployment speciﬁcations each capturing a separate concern and they have to

utilise the right deployment tools in the reference architecture for the right

tasks. In order to be able to properly automate the entire deployment process,

it is required to have all the knowledge about implementation details of the

system, which is not always available. Furthermore, manually arranging the

deployment tools to execute the right deployment tasks is not always a trivial

task. In this chapter, we describe a high-level deployment tool for web appli-

cations written in the WebDSL language. WebDSL is a domain-speciﬁc lan-

guage for developing web applications with a rich data model on a high-level,

providing static consistency checking and hiding implementation details. As

with the WebDSL language, the webdsldeploy tool hides deployment details

and allows developers or system administrators to automatically and reliably

deploy WebDSL web applications in distributed environments for both pro-

duction use or for testing using high-level deployment speciﬁcations.

11.1 I N T R O D U C T I O N

In this thesis, we have described various tools solving various deployment

concerns, ranging from local components to distributed components on serv-

ice level and infrastructure level and we have seen that these tools support

various interesting non-fun ctional properties, such as reliability. In order to

fully automate the deployment process for systems in a particular domain,

these tools must be combined and the required deployment speciﬁcations

must be provided, which is not always a very trivial task.

WebDSL [Visser, 2008] was intended as a case study in domain-speciﬁc

language engineering developed at Delft University of Technology, consisting

of several declarative sub-languages each capturing a separate aspect, such as

a data model, page deﬁnitions and access control. The purpose of the domain-

speciﬁc language is to reduce boilerplate code, hide implementation details

and to provide static consistency checking. From a WebDSL program, a Java

web application is generated, which must be deployed in an environment

203

providing a Java EE application server, a MySQL database server and several

other important infrastructure components.

Although the WebDSL language provides a number of advantages during

development, deploying a WebDSL web application is still complicated, and

requires end-users to know implementation speciﬁc details. In order to de-

ploy a WebDSL application, the WebDSL compiler and its dependencies must

be installed, the Java web application must be generated and the required in-

frastructure components, including the MySQL database and Apache Tomcat

services must be installed and conﬁgured.

Apart from deploying a WebDSL application and their infrastructure com-

ponents on a single machine, it may also be desired to deploy them in a

distributed environment, for example, to support load balancing. As we have

already seen frequently in this thesis, distributed deployment processes are

even more complicated and also require more knowledge about implementa-

tion details.

In this chapter, we describe webdsldeploy, a domain-speciﬁc deployment tool

for WebDSL applications allowing developers or system administrators to de-

ploy WebDSL applications on single machines and distributed environments

for production use or testing using high-level speciﬁcations hiding implemen-

tation details.

The purpose of constructing this domain-speciﬁc tool is to explore whether

it is feasible to turn the reference architecture described in Chapter 3 into a

concrete-architecture (for which we use the web application domain as a case

study) and to see what patterns we can use.

11.2 W E B D S L

As explained earlier, WebDSL [Visser, 2008] was intended as a case study

in domain-speciﬁc language engineering enabling users to develop web ap-

plications with a rich data model from a high abstraction level, providing

additional static consistency checking and hiding implementation details.

The main motivation lies in the fact that web applications are often devel-

oped in different languages each addressing a speciﬁc concern. For instance,

in Java web applications, a data model is often described in SQL for the cre-

ation of tables, pages are written as JSP pages or Java Servlets (which itself

embed HTML), the style of a webpage is described in CSS and various dy-

namic elements of web pages are written in JavaScript.

Languages used to develop web applications are often loosely coupled; it is

very difﬁcult to check a web application on errors during compile time. For

instance, a web application written in Java can be checked by the Java com-

piler, but often other languages are embedded inside Java code, such as SQL

code to communicate with the database backend. SQL statements are often

coded in strings, which itself is not checked by the Java compiler whether it is

valid SQL. This introduces various problems and security risks, such as SQL

injection attacks. Moreover, there is also a lot of boilerplate code and redundant

204

Figure 11.1 Stack of WebDSL sub languages

patterns that are used over and over again which signiﬁcantly increases the

code size of web applications and therefore also the chance on errors.

WebDSL overcomes these issues by offering a collection of integrated do-

main speciﬁc languages. Figure 11.1 illustrates the various sub languages in

WebDSL, such as a sub language to describe access control [Groenewegen and

Visser, 2008] and workﬂow [Hemel et al., 2008]. High level sub languages are

translated into lower level sub languages and are then uniﬁed by the WebDSL

compiler into a single core language [Hemel et al., 2010]. Finally, the core lan-

guage is translated to a web application written in Java code, targetting Java

application servers or Python code, targetting the Google App Engine

Figure 11.1 has been taken from the WebWorkﬂow paper [Hemel et al.,

2008], which includes more details on the sub languages and their transla-

tions.

11.3 E X A M P L E S O F U S I N G T H E W E B D S L L A N G U A G E

As we have explained, WebDSL is a domain-speciﬁc language composed of

various sub languages each adressing a speciﬁc concern of a web application.

In this section, we show a small number of examples.

Chapter 11. Declarative Deployment of WebDSL applications 205

entity Author { 136

name :: String 137

address :: String

city :: String

}

entity Paper {

title :: String

date :: Date

author -> Author 138

}

Figure 11.2 An example WebDSL Data Model

define page authors() {

header { "All authors" }

list() {

for(a : Author) {

listitem { output(a) }

}

Figure 11.3 An example WebDSL page deﬁnition

11.3.1 Data Model

One of the aspects that WebDSL facilitates, is the storage of data and infor-

mation how to present these data entities. A WebDSL data model consists of

entity deﬁnitions which declare entity types. Entities are mapped on tables

in a relational database by using a Object Relational Mapping (ORM) imple-

mentation, such as Hibernate [The JBoss Visual Design Team, 2012] used in

the Java back-end.

Figure 11.2 illustrates an example of a WebDSL Data Model in which two

entities are deﬁned – an author that writes several papers and a paper which

is written by an author:

136 This entity statement declares the Author entity.

137

Entities consist of properties with a name and type. Types can be a value

type, indicated by ::, such as a String, Int or a Date or a more complex type,

such as WikiText and Image.

138 Types can also be associations to other entities in the data model. In this

example, we have deﬁned that a paper is written by a single author.

11.3.2 Pages

Apart from specifying the structure of data entities, it is also desired to deﬁne

pages which display contents and present data to end-users.

There have been several experi ments in targetting different platforms using WebDSL, but

the Java back-end is the only one that is mature.

206

Figure 11.4 WebDSL pages rendered in a browser

define page editPaper(p : Paper) { 139

section {

header { "Edit Paper: " output(p.title) }

form {

table {

row { column {"Title:"} column { input(p.title) } }

140

row { column {"Date:"} column { input(p.date) } }

row { column {"Author:"} column { input(p.author) } }

}

submit save() { "Save" } 141

}

action save() { 142

p.save();

return viewPaper(p);

}

Figure 11.5 An example of a WebDSL action inside a form

Figure 11.3 shows a simple example of a page deﬁnition, in which we show

a list of authors to the end-user. The output element uses the data type of the

object (deﬁned in the data model) to determine what HTML elements are

needed to display the data element. The left image in Figure 11.4 shows how

the page may be rendered.

11.3.3 Actions

Apart from displaying information, it may also be desired to implement oper-

ations that modify information, as shown in Figure 11.5. These operations are

deﬁned as actions described in an imperative sub language. The right image

in Figure 11.4 shows how the page may be rendered:

139 The code fragment deﬁnes a page, which purpose is to edit attributes of

a paper.

Chapter 11. Declarative Deployment of WebDSL applications 207

140 input elements are used to connect data from variables to entity elements.

141 Actions can be embedded in pages with a form block and attached to a

button or a link. In this case, we deﬁne a submit button with the “Save”

label, attached to a save() action.

142 This code block deﬁnes the save() action, which stores the values in the

form in the database and redirects the user to a page that views the

paper.

11.4 D E P L O Y I N G W E B D S L A P P L I C AT I O N S

In order to deploy a WebDSL application, we must install the WebDSL com-

piler and its dependencies, then generate and build a Java web application

from the WebDSL application and ﬁnally deploy the infrastructure compo-

nents required to run a WebDSL application.

11.4.1 Deploying the WebDSL compiler

The WebDSL compiler is implemented by using the Syntax Deﬁnition Formal-

ism (SDF2) [Visser, 1997], the Stratego program transforming language [Bra-

venboer et al., 2008], Java-front [Bravenboer and Visser, 2004] which can be

used in conjunction with Stratego to generate or transform Java code and

the ATerm (Annotated Terms) library [Van den Brand et al., 2000], used to

exchange tree-like data structures in a standard and efﬁcient manner. De-

ploying the WebDSL compiler can be done by various means. It can be de-

ployed as a stand-alone command-line compiler tool as well as part of the

WebDSL Eclipse plugin, which provides an integrated IDE for WebDSL using

Spoofax [Kats and Visser, 2010]. Typically, in order to use it properly with

tools in our reference architecture, the command-line version must be used

and all the components must be implemented as Nix packages and deployed

from source code. The WebDSL compiler and their dependencies can be built

using the standard autotools [MacKenzie, 2006] build procedure, i.e. ./conﬁg-

ure; make; make install in a straight forward manner.

11.4.2 Building a WebDSL application

In order to be able to use a WebDSL application, it must be compiled to a

Java web application. First the WebDSL application must be conﬁgured to

contain deployment settings in a conﬁguration ﬁle called application.ini, which

contains key = value pairs describing various deployment properties, such

as the JNDI resource identiﬁer of the database connection and the address of

the SMTP server. Apart from generating Java code from WebDSL application

code, the Java code must also be compiled and packaged in a Web Application

Archive (WAR). In order to compile and package a Java web application, the

208

Java Development Kit must be installed as well as Apache Ant

, used to run

the Java build processes.

11.4.3 Deploying WebDSL infrastructure components

After the Java web application has been built, it must be deployed in an en-

vironment providing the required infrastructure components. In order to run

a WebDSL application, we must install the Apache Tomcat application server,

the MySQL database server and an SMTP server, if the e-mail functionality is

used. For efﬁciency, a native Apache web server may be used to serve static

content of web applications. Furthermore, in order to allow an application to

have better performance, several conﬁguration settings must be adapted, such

as the maximum heap memory usage.

The infrastructure components can be deployed on a single machine, but

may also be distributed across multiple machines. For example, the database

server may be deployed on a different machine as the Apache Tomcat web ap-

plication. Also multiple redundant instances of the application and database

servers may be deployed to provide load balancing.

11.5 W E B D S L D E P L O Y

To provide automatic, reliable and reproducible deployment of WebDSL ap-

plications from a high-level deployment speciﬁcation, we have developed

webdsldeploy, a Nix function that produces a NixOS logical network conﬁg-

uration from a WebDSL deployment speciﬁcation. As we have described in

Chapter 6, the NixOS network speciﬁcation is used in conjunction with a

physical network speciﬁcation to deploy a network of NixOS machines and

can also be used to generate a network of virtual machines, which can be

cheaply instantiated for testing, as we have described in Chapter 7. In this sec-

tion, we show a number of examples ranging from very simple speciﬁcations

for single machines to more complicated expressions specifying properties of

the underlying distributed infrastructure.

11.5.1 Single machine deployment

Figure 11.6 shows a simple use case of the WebDSL deployment function,

which purpose is to deploy two WebDSL applications, including all their de-

pendencies and infrastructure components on a single machine. This is the

most common use case of the WebDSL deployment function.

143 This import statement invokes the WebDSL build function, which gener-

ates a logical network speciﬁcation.

144

The applications parameter is a list of attribute sets specifying the Web-

DSL applications that must be deployed. In our example, we deploy

http://ant.apache.org

Chapter 11. Declarative Deployment of WebDSL applications 209

import <webdsldeploy/generate-network.nix> { 143

applications = [

144

{ name = "yellowgrass";

src = /home/sander/yellowgrass;

rootapp = true;

}

{ name = "webdslorg";

src = /home/sander/webdslorg;

}

];

databasePassword = "admin"; 145

adminAddr = "s.vanderburg@tudelft.nl"; 146

}

Figure 11.6 single.nix: A WebDSL deployment scenario on a single machine

two WebDSL applications named Yellowgrass and webdsl.org. Each at-

tribute set takes a number of common attributes. The name speciﬁes

the name of the application, the src attribute speciﬁes where the source

code of the WebDSL application can be found (which can refer to a loca-

tion on the ﬁlesystem, or bound to a function, such as fetchsvn to obtain

it from a Subversion repository). The rootapp is an optional attribute,

which defaults to false. If it is set to true then the resulting web appli-

cation is called ROOT and can be accessed from the root directory of

particular virtual host conﬁgured on Apache Tomcat. There can be only

one ROOT application per virtual host.

145 WebDSL applications require a password to connect to a database. The

databasePassword parameter sets the password.

146 The webserver proxies require an adminstrator e-mail address, which

can be conﬁgured through setting the adminAddr parameter.

11.5.2 Distributing infrastructure components

Figure 11.7 shows a more complicated deployment expression, in which the

infrastructure components are divided across two machines:

147 If the distribution parameter is speciﬁed, then the infrastructure compo-

nents can be divided across machines in the network.

148 The machine test1 is conﬁgured to host the Apache Tomcat and Apache

webserver proxies.

149 The machine test2 is conﬁgured to host the MySQL database server.

11.5.3 Implementing load-balancing

Figure 11.8 shows an even more complicated example with three machines,

in which the ﬁrst one serves as a reverse proxy server redirecting incoming

210

import <webdsldeploy/generate-network.nix> {

applications = [

{ name = "yellowgrass";

src = /home/sander/yellowgrass;

rootapp = true;

}

];

databasePassword = "admin";

adminAddr = "s.vanderburg@tudelft.nl";

distribution = { 147

test1 = { 148

tomcat = true;

httpd = true;

};

test2 = {

149

mysql = true;

};

}

Figure 11.7 distr ibuted.nix: A distributed WebDSL deployment scenario in which the

database is deployed on a separate machine

import <webdsldeploy/generate-network.nix> {

applications = [

{ name = "yellowgrass";

src = /home/sander/yellowgrass;

rootapp = true;

}

];

databasePassword = "admin";

adminAddr = "s.vanderburg@tudelft.nl";

distribution = {

test0 = {

150

proxy = true;

};

test1 = { 151

tomcat = true;

httpd = true;

mysqlMaster = true;

};

test2 = { 152

tomcat = true;

httpd = true;

mysqlSlave = 2;

};

}

Figure 11.8 loadbalancing.nix: A WebDSL deployment scenario in which multiple

application servers and database servers are used

calls to the other web servers. The other two machines are both redundant

application, web and database servers:

Chapter 11. Declarative Deployment of WebDSL applications 211

150 The machine test0 is conﬁgured to run a reverse proxy server and serves

as the entry point to end-users. The reverse proxy server forwards in-

coming connections to machines hosting a webserver (httpd component)

using the round-robin scheduling method. In our example, connections

are forwarded to test1 or test2.

151 The machine test1 is conﬁgured to run an Apache Tomcat server and an

Apache webserver proxy. Furthermore, it hosts a MySQL master server,

to which all database write actions are redirected.

152 The machine test2 is conﬁgured to use the same infrastructure compo-

nents as the test1 machine. Instead of a MySQL master server, it hosts

a MySQL read slave server, which is regularly synchronised with the

master server. Furthermore, database read operations from a WebDSL

application are forwarded to any of the MySQL servers. Each MySQL

read slave requires a unique id, which starts with 2 (1 is assigned to the

master server).

11.5.4 Usage

There are two ways to use the WebDSL deployment function. The ﬁrst use

case is to build a virtual network of machines, which can be used to easily

test a WebDSL application in an environment closely resembling the produc-

tion environment. The following example generates and runs a single virtual

machine deﬁned in Figure 11.6:

$ nixos-build-vms single.nix

$ ./result/bin/nixos-run-vms

Another use case is to use charon and a physical network speciﬁcation to

automatically deploy a WebDSL application in a network of machines. The

following example deploys the load balancing example, shown in Figure 11.8,

in a network of Amazon EC2 virtual machines:

$ charon create loadbalancing.nix ec2.nix

$ charon deploy

11.6 I M P L E M E N TAT I O N

In order to generate a logical network speciﬁcation from a WebDSL speciﬁca-

tion, we must derive how to build and conﬁgure a WebDSL application and

the NixOS conﬁguration settings for each machine in the network.

11.6.1 Building and conﬁguring a WebDSL application

Figure 11.9 shows how the WebDSL application build function is implemented:

153 This expression is essentially a nested function. The outer function is

used to compose the inner-function by providing the required WebDSL

212

{stdenv, webdsl, apacheAnt}: 153

{ name, src, dbserver, dbname, dbuser, dbpassword, dbmode ? "update"

, rootapp ? false, replication ? false }: 154

let

war = stdenv.mkDerivation { 155

inherit name src;

buildInputs = [ webdsl apacheAnt ];

buildPhase = ’’ 156

( echo "appname=${name}"

echo "db=jndi"

echo "dbmode=

${dbmode}"

echo "dbjndipath=java:/comp/env/jdbc/${dbname}"

echo "indexdir=/var/tomcat/temp/indexes/${name}"

echo "rootapp=${if rootapp then "true" else "false"}"

) > application.ini

webdsl war

’’;

installPhase = ’’ 157

mkdir -p $out/webapps

cp .servletapp/

.war $out/webapps

’’;

};

driverClassName = if replication

then "com.mysql.jdbc.ReplicationDriver" else "com.mysql.jdbc.Driver"; 158

replicationParameter = if replication

then "&roundRobinLoadBalance=true"

else "";

159

stdenv.mkDerivation { 160

inherit name;

buildCommand = ’’

mkdir -p $out/conf/Catalina

cat > $out/conf/Catalina/

${if rootapp then "ROOT" else name}.xml <<EOF

<Resource name="jdbc/${dbname}" auth="Container" type="javax.sql.DataSource"

driverClassName="${driverClassName}"

username="${dbuser}" password="${dbpassword}"

url="jdbc:mysql://${dbserver}:3306/${dbname}?${replicationParameter}" />

</Context>

EOF

mkdir -p $out/webapps

ln -s

${war}/webapps/

.war $out/webapps

’’;

}

Figure 11.9 webdslbuild.nix: Nix function for building a WebDSL application

build-time dependencies, such as the standard UNIX environment, the

WebDSL compiler and Apache Ant. Typically, this expression is ﬁrst

composed by calling it with the right arguments, as we have seen earlier

in Section 2.2.3.

154 The inner function deﬁnes the actual build function which should be

invoked by users to build WebDSL applications. The inner function

header takes a number of arguments. The name argument corresponds

to the name of the WebDSL application, the src arguments refers to the

Chapter 11. Declarative Deployment of WebDSL applications 213

location of the source code of the WebDSL application. The inner func-

tion also takes a number of database related function arguments, the

rootapp argument deﬁning whether the application should be accessible

from the root path in a virtual host and the replication parameter indicat-

ing whether a master-slave MySQL replication infrastructure should be

used.

155 This local attribute is bound to a build result that produces a Web Ap-

plication Archive (WAR) from WebDSL application source code.

156 The build phase procedure generates an application.ini ﬁle from the func-

tion arguments deﬁned at 154 . Then it invokes the webdsl war command-

line instruction, which generates Java code, compiles the Java code and

assembles a web application archive.

157 The install phase installs the generated WAR ﬁle in the webapps/ direc-

tory of the resulting Nix package.

158 The local attribute driverClassName speciﬁes the name of the JDBC driver

that the generated WebDSL application should use. If no master-slave

conﬁguration is used then the ordinary com.mysql.jdbc.Driver is used. If

the replication option is enabled then the more advanced

com.mysql.jdbc.ReplicationDriver is used.

159 If the master-slave replication feature is enabled the JDBC connection

must be conﬁgured to forward read operations to the read slaves using

the round-robin scheduling method.

160 In the actual body of the function, we generate a wrapper component

consisting of a symlink to the WAR ﬁle we have generated at 155 and a

context XML conﬁguration ﬁle. As we have seen earlier in Chapter 4,

this conﬁguration ﬁle is used by Apache Tomcat, to conﬁgure various

application resources, such as database connections. In our example,

we generate a JDBC resource for the settings provided by the function

arguments, including the authentication and replication settings.

11.6.2 Generating a logical network speciﬁcation

Figure 11.10 contains a simpliﬁed code fragment written in the Nix expression

language, showing how the network generator is implemented:

161 The code fragment is a function taking several arguments. The pkgs

arguments refers to the Nixpkgs collection and imports the default col-

lection if it is not speciﬁed. The applications parameter is – as we have

seen earlier in Figure 11.6 – a list of attribute sets deﬁning several prop-

erties of WebDSL applications. The distribution parameter speciﬁes – as

we have seen earlier in Figure 11.7 – which infrastructure components

should be deployed to which machine. If the parameter is omitted, it

assigns all the infrastructure components to a single machine called test.

214

{ pkgs ? import <nixpkgs> {}, applications

, distribution ? { test = { tomcat = true; httpd = true; mysql = true; ...}}

, databasePassword, adminAddr

}: 161

with pkgs.lib;

let

webdslBuild = import <webdsldeploy/webdslbuild.nix> { 162

inherit (pkgs) stdenv webdsl apacheAnt;

};

webapps = map (application: 163

webdslBuild {

inherit (application) name src;

rootapp = if application ? rootapp then application.rootapp else false;

dbname = if application ? databaseName

then application.databaseName

else application.name;

...

}

) applications;

mapAttrs (targetName: options:

164

{pkgs, ...}:

{

require = [ <webdsldeploy/modules/webdsltomcat.nix> ... ]; 165

} //

optionalAttrs (options ? tomcat && options.tomcat) { 166

webdsltomcat.enable = true;

webdsltomcat.webapps = webapps;

...

} //

optionalAttrs (options ? mysql && options.mysql) {

167

webdslmysql.enable = true;

webdslmysql.databasePassword = databasePassword;

...

} //

optionalAttrs (options ? httpd && options.httpd) {

168

webdslhttpd.enable = true;

webdslhttpd.adminAddr = adminAddr;

...

} //

...

) distribution

Figure 11.10 generate-network.nix: Simpliﬁed structure of the WebDSL deployment

function

162 Here the webdslBuild attribute is bound to a statement that imports and

composes the webdslBuild function, which we have shown earlier in Fig-

ure 11.9.

163 Here, we iterate over every application and we invoke the webdslBuild

function, with the required parameters. The result of the map operation

is a list of derivations that produce the WAR ﬁles.

164 Here, we iterate over all attributes in the distribution function parameter.

For each attribute, we generate a NixOS conﬁguration that capture all

Chapter 11. Declarative Deployment of WebDSL applications 215

required infrastructure properties. The result is a logical network speci-

ﬁcation as we have shown in Chapter 6.

165 The require attribute is used by the NixOS module system to include an

aribitrary NixOS modules. Here, we import all the custom NixOS mod-

ules that capture WebDSL speciﬁc aspects, such as webdsltomcat, webd-

slmysql, and webdslhttpd.

166 This section sets the Apache Tomcat properties of the WebDSL environ-

ment if the tomcat attribute of the distribution element is deﬁned and set

to true. Later in Figure 11.11, we show how the related NixOS module is

deﬁned.

167 This section sets the MySQL properties of the WebDSL environment if

the mysql attribute of the distribution element is deﬁned and set to true.

168 This section sets the Apache webserver properties of the WebDSL envi-

ronment if the httpd attribute of the distribution element is deﬁned and

set to true.

The actual implementation of the function is a bit more complicated and

solves many more problems. For example, to properly support MySQL master-

slave replication, we have to generate a connection string that consists of the

hostname of the master server, followed by the slave hostnames each sepa-

rated by a comma. In order to generate such a string, we have to search for

the right attributes in the distribution. A similar approach is required to conﬁg-

ure the reverse proxy server to support web server load balancing.

11.6.3 Conﬁguring WebDSL infrastructure aspects

The last missing piece in the puzzle is how various infrastructure aspects

are conﬁgured. In Figure 11.10 we conﬁgure properties of various WebDSL

NixOS modules. Figure 11.11 shows the NixOS module that captures Apache

Tomcat speciﬁc aspects:

169 This section deﬁnes all the options of the Apache Tomcat that hosts com-

piled WebDSL applications. As we have seen in Figure 11.10, we have to

decide whether to enable the module and what compiled WebDSL web

applications it should host.

170 This section deﬁnes the relevant Apache Tomcat NixOS conﬁguration

settings (Apache Tomcat is a module included with NixOS), if the mod-

ule is enabled. Here, we enable the Apache Tomcat NixOS module and

we set the web applications that it should host. The WebDSL NixOS

module properties are identical to the ones of the Apache Tomcat NixOS

module.

171 This section deﬁnes a collection of libraries that are shared between the

Servlet container and the web applications. As mentioned earlier, all

216

{config, pkgs, ...}:

with pkgs.lib;

let

cfg = config.webdsltomcat;

in {

options = { 169

webdsltomcat = {

enable = mkOption {

description = "Whether to enable the WebDSL tomcat configuration";

default = false;

};

webapps = mkOption {

description = "WebDSL WAR files to host";

default = [];

};

config = mkIf cfg.enable {

170

services.tomcat.enable = true;

services.tomcat.webapps = cfg.webapps;

services.tomcat.commonLibs =

[ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ]; 171

};

}

Figure 11.11 webdsltomcat.nix: NixOS module capturing WebDSL related Apache

Tomcat conﬁguration aspects

WebDSL applications require a JDBC connection to a MySQL database.

Therefore, we have to include the MySQL JDBC driver.

11.7 E X P E R I E N C E

We have used the WebDSL deployment function for a number of WebDSL

applications, such as Yellowgrass

, webdsl.org

and researchr

. We were able

to deploy complete WebDSL environments in physical networks, VirtualBox

virtual machines, Amazon EC2 virtual machines and QEMU virtual machines

(with the NixOS test driver). We can use the webdsldeploy function both with

charon to actually deploy machines and with the NixOS test driver to run them

non-interactively and to automatically perform testcases.

The webdsldeploy function also supports a number of interesting upgrade

operations. For example, by adding or removing a redundant MySQL slave,

application server or web server components to machines in the distribution, a

WebDSL application can be scaled up or scaled down, with almost no down-

time (because most operations are performed by Nix do not affect running

systems, as we have explained in many previous chapters).

http://yellowgrass.org

http://webdsl.org

http://researchr.org

Chapter 11. Declarative Deployment of WebDSL applications 217

11.8 D I S C U S S I O N

In this chapter, we have constructed a domain-speciﬁc deployment tool us-

ing domain-speciﬁc deployment expressions. We can transform a domain-

speciﬁc deployment expression into a logical network speciﬁcation, by using

a number of Nix utility functions and the NixOS module system to capture

infrastructure component properties relevant to the domain.

Although the webdsldeploy function provides several interesting features, it

also has a number of limitations. The major limitation is that it is not capable

of migrating state. For example, if a user or developer moves a database

from one machine to another, then the database contents is not automatically

migrated. Although we have developed a solution for state in Chapter 9,

the cost of supporting mutable components may be high and require further

investigation. The size of the researchr database is (at the time writing this

thesis) 60 GiB, which is too costly to write entirely to the ﬁlesystem for each

database change. As a practical issue, Dysnomia has not been integrated in

NixOS yet.

WebDSL data models may also evolve. In some cases, the underlying

database may no longer be compatible with newer versions of a WebDSL

applications. To deal with evolving data models, a separate prototype tool

has been developed, called Acoda [Vermolen et al., 2011]. In Acoda, a devel-

oper can specify the evolution of the original data model in a domain-speciﬁc

sub language (which is intended to be integrated in WebDSL) and migrate

the data to the new data model automatically. However, Acoda has not been

integrated in the webdsldeploy function yet. Furthermore, in order to support

Acoda properly, supporting state migrations is an essential precondition.

The webdsldeploy function only uses the infrastructure deployment tools

from the reference architecture for distributed software deployment. It is also

possible to utilise service deployment aspects, for example, to use Disnix to

deploy the databases and web applications. However, the application archi-

tecture of a WebDSL application is relatively simple (it only consists of a web

application and database) and therefore it is not really beneﬁcial to imple-

ment a service deployment layer. However, there is also ongoing work of

implementing web ser vice support in WebDSL, which may signiﬁcantly in-

crease the deployment complexity of WebDSL applications. In the future, we

may extend webdsldeploy to utilise Disnix to deploy all service components.

11.9 C O N C L U S I O N

In this chapter, we have described webdsldeploy, a domain-speciﬁc deployment

function for WebDSL applications. This function uses a high-level deployment

expression from which a logical network speciﬁcation is derived, which can

be used by charon to deploy a network of machines and by the NixOS test

driver, to launch a network of efﬁcient virtual machines. A major limitation

of the tool is that state migration is not supported. As a consequence, for

some types of upgrades the database contents is not migrated.

218

Supporting Impure Platforms: A .NET Case

Study

A B S T R A C T

The tools in our reference architecture offer various beneﬁts, such as relia-

bility and reproducibility guarantees, which are not always trivial to achieve

with conventional deployment tools. However, in order to use any of these

tools properly, we have made several assumptions on the system that is being

deployed, such as the fact that every component can be properly isolated. In

practice, we may want to use these tools to deploy large existing codebases,

which were typically deployed by other means and in impure environments.

In these situations, adopting Nix or any Nix application is a complicated pro-

cess with a number of technical challenges. In this chapter, we explain how

we have implemented deployment support in Nix and Disnix for .NET soft-

ware on Microsoft Windows platforms, so that we can automate deployment

processes of service-oriented systems utilising .NET technology. We have con-

ducted these experiments at Philips Healthcare.

12.1 I N T R O D U C T I O N

In the previous chapters, we have described the components belonging to the

reference architecture for distributed software deployment and we have ex-

plained that they offer a number of distinct advantages compared to conven-

tional deployment tools. Moreover, we have shown how we can turn the refer-

ence architecture for distributed software deployment into a domain-speciﬁc

deployment tool for WebDSL applications.

It may also be possible that we want to apply any of these tools on large

code bases, which have been previously deployed by other means, in impure

environments and possibly in an ad-hoc manner. In these situations, adopting

Nix or any Nix application to automate the entire deployment process is a

complicated process as certain facilities must be implemented and a number

of technical challenges must be overcome, such as changes in the architecture

of the system that must be deployed.

In our research project, we have collaborated with Philips Healthcare, who

develop and maintain a medical imaging platform implemented using .NET

technology on the Microsoft Windows operating system. In Microsoft Win-

dows environments, many software packages are proprietary and typically

deployed by using tools implementing imperative deployment concepts. These

packages can often not be isolated and they have implicit assumptions on the

219

environment (such as certain locations on the ﬁle systems), for which we have

to implement a number of workarounds.

In this chapter, we describe how we have implemented .NET deployment

support in Nix and Disnix

12.2 M O T I VAT I O N

There are a number of reasons why we want to implement .NET deployment

support in Nix. One of the reasons is that installing or upgrading .NET ap-

plications offers the same deployment beneﬁts that Nix has, such as being

able to store multiple versions/variants next to each other, dependency com-

pleteness, atomic upgrades and rollbacks and a garbage collector which safely

removes components no longer in use.

Another reason is that we can utilise the applications built around Nix,

such as Hydra, the continuous build and integration server, to build and test

.NET applications in various environments including environmental depen-

dencies and Disnix, to manage the deployment of a service-oriented applica-

tions and web applications developed using .NET technology in a network

machines. Finally, Nix and related tools are designed as generic tools (i.e.

not developed for a particular component technology). Being able to support

.NET applications is a useful addition. In this chapter, we investigate what

means are necessary to support automatic deployment using tools in our ref-

erence architecture.

12.3 G L O B A L A S S E M B LY C A C H E ( G A C )

Traditional .NET applications reference library assemblies from a central cache

called the Global Assembly Cache (GAC) [Microsoft, 2010a] designed as a so-

lution to counter the well-known DLL hell [Pfeiffer, 1998]. Its purpose is

comparable with the Nix store. Although the GAC solves several common

deployment issues with Windows applications, it has a number of drawbacks

compared to the Nix store:

• It only provides isolation for library assemblies. Other components such

as executables, compilers, conﬁguration ﬁles, or native libraries are not

supported.

• A library assembly must have a strong name, which gives a library a

unique name. A strong name is composed of several attributes, such as

a name, version number and culture. Furthermore, the library assembly

is signed with a public/private key pair.

• Creating a strong-named assembly is in many cases painful. A devel-

oper must take care that the combination of attributes is always unique.

For example, for a new release the version number must be increased.

More technical details can be found in my following blog posts: [van der Burg, 2011b,c,d]

220

Because developers have to take care of this, people typically do not use

strong names for internal release cycles, because it is too much work.

• Creating a strong-named assembly may go wrong. It may be possible

that a developer forgets to update any of these strong name attributes,

which makes it possible to create a different assembly with the same

strong name. Then isolation in the GAC cannot be provided.

In contrast to the GAC, you can store any type of component in the Nix

store, such as executables, conﬁguration ﬁles, compilers etc. Furthermore,

the Nix store uses hash codes derived from all build-time dependencies of a

component, which always provides unique component ﬁle names.

12.4 D E P L O Y I N G . N E T A P P L I C AT I O N S W I T H N I X

In order to be able to deploy software components implemented in .NET tech-

nology with the Nix package manager, we must provide build-time support

consisting of a function that builds components and which is capable of ﬁnd-

ing the required buildtime dependencies and we must implement runtime

support so that the produced executables in the Nix store are capable of ﬁnd-

ing their required runtime dependencies.

12.4.1 Using Nix on Windows

As we have explained earlier, the Nix package manager is support on various

operating systems, including Microsoft Windows through Cygwin. Cygwin

acts as a UNIX compatibility layer on top of the Win32 API.

12.4.2 Build-time support

Typically, software systems implemented with .NET technology are compo-

nentised by dividing a system in a collection of Visual Studio projects

. Visual

Studio projects can be built from the Visual Studio IDE or non-interactively

by running the MSBuild command-line tool.

Invoking the .NET framework

The MSBuild tool – which is used to build Visual Studio projects non-interactively

– is included with the .NET framework. The .NET framework is tightly in-

tegrated into the Windows operating system and also requires several reg-

istry settings to be conﬁgured. For these reasons, we cannot isolate the .NET

framework in the Nix store in a pure manner as it has dependencies on com-

ponents which reside outside the Nix store, which cannot be reproduced in a

pure manner.

A Visual Studio project is not necessarily a component in a strict sense and would only

map to Szyperski’s deﬁnition [Szyperski et al., 2002] if developers take a number of important

constraints into account.

Chapter 12. Supporting Impure Platforms: A .NET Case Study 221

{stdenv, dotnetfx}: 172

{ name, src, slnFile, targets ? "ReBuild"

, options ? "/p:Configuration=Debug;Platform=Win32"

, assemblyInputs ? [] 173

stdenv.mkDerivation { 174

inherit name src; 175

buildInputs = [ dotnetfx ]; 176

installPhase = ’’

177

for i in ${toString assemblyInputs}; do

windowsPath=$(cygpath --windows $i)

AssemblySearchPaths="$AssemblySearchPaths;$windowsPath"

done

export AssemblySearchPaths

ensureDir $out

outPath=$(cygpath --windows $out)\\

MSBuild.exe

${slnFile} /nologo /t:${targets} \

/p:OutputPath=$outPath ${options} ...

’’;

}

Figure 12.1 A Nix function for building a Visual Studio C# project

To make the MSBuild tool available during the build phase of a Nix compo-

nent, we have created a proxy component, which is a Nix package containing

a symlink to a particular MSBuild.exe executable hosted on the host system

outside the Nix store. Every variant of the .NET framework is stored in a

separate directory on the host ﬁlesystem and has its own version of MSBuild.

For example, by creating a symlink to

/cygdrive/c/WINDOWS/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe, we can imple-

ment a proxy component for the .NET framework version 4.0.

The major obstacle in using a proxy component is that dependencies are not

automatically included in the closure of a package, as the actual dependencies

reside on the host system, and must be installed by other means. For example,

if we want to copy a closure from one machine to another, then we must

manually ensure that the required .NET framework is present on the target

machine.

Another weakness of proxy components is that we cannot always guaran-

tee proper isolation. Fortunately, the .NET framework is stored in a sepa-

rate folder, instead of a global namespace, such as C:\WINDOWS\System32. The

name and version number in the path name ensure that the .NET framework

is isolated, although it uses a nominal dependency scheme and does not take

all related build attributes into account, such as the system architecture.

Implementing a Nix build function

Figure 12.1 shows how we have implemented a function capable of building

Visual Studio C# projects:

172

This expression is essentially a nested function. The outer function is

used to compose the inner-function by providing the required .NET

build-time dependencies, such as the standard UNIX environment (stdenv)

and the .NET framework (dotnetfx). Typically, this expression is ﬁrst com-

222

posed by calling it with the right arguments, as we have seen earlier in

Section 2.2.3.

173 The inner function deﬁnes the actual build function which should be

invoked by users to build .NET applications. The inner function header

takes a number of arguments. The name argument corresponds to the

name of the resulting component that ends up in the Nix store path,

the src arguments refers to the location of the source code of the .NET

application, the slnFile argument refers to a Visual Studio Solution ﬁle

that deﬁnes how to build a project, the targets argument speciﬁes which

build targets have to be processed (defaults to: ReBuild) the options argu-

ment speciﬁes additional build parameters to MSBuild, and assemblyInputs

argument is a list of derivations containing library assemblies (which

themselves may be produced by other Nix derivation functions) which

should be found at build-time.

174

In the body of the inner-function, we invoke the stdenv.mkDerivation, which

as we have seen many times earlier, builds a package from source code

and a number of given dependencies.

175 Here, we pass the given name and src attributes to the builder function,

which takes care of unpacking it.

176 The buildInputs parameter speciﬁes all build-time dependencies for the

standard builder. Here, we pass dotnetfx as a build input, so that the

MSBuild tool can be found.

177 The installPhase deﬁnes all the installation steps. Here, we compose the

AssemblySearchPaths environment variable, which is a semi-colon sepa-

rated string of path names in which the MSBuild tool looks for its build-

time dependencies. Because Nix is using Cygwin, the path names are

in UNIX format with forward slashes. We use the cygpath command to

convert them to Windows path names using backslashes. After compos-

ing the environment variable, we create the output directory in the Nix

store and we run MSBuild to build the Visual Stud io project. The MSBuild

tool has been instructed to store the output in the Nix store.

12.4.3 Run-time support

Apart from building .NET applications, we must also be able to run them

from the Nix store. As we have explained earlier, native ELF executables use

the RPATH header, which contains Nix store paths of all librar y dependencies.

Furthermore, for other types of components, such as Java, Perl or Python

programs, we can use environment variables, such as CLASSPATH, PERL5LIB

and PYTHONPATH to specify where runtime dependencies can be found. Nix

provides several convenience functions that automatically wrap programs in

shell scripts that automatically set these environment variables.

Chapter 12. Supporting Impure Platforms: A .NET Case Study 223

Unfortunately the .NET runtime does not have facilities such as an environ-

ment variable to search for runtime dependencies

. The .NET runtime locates

run-time dependencies roughly as follows [Microsoft, 2010b]:

• First, the .NET runtime tries to determine the correct version of the as-

sembly (only for strong named assemblies)

• If the strong named assembly has been bound before in memory, that

version will be used.

• If the assembly is not already in memory, it checks the Global Assembly

Cache.

• If all the previous steps fail, it probes the assembly, by looking in a conﬁg

ﬁle or by using some probing heuristics. Probing heuristics search in

the same base directory of the executable or a number of speciﬁed sub

directories. Only strong-named assemblies can be probed from arbitrary

ﬁlesystem locations and remote URLs, however they are always cached

in the Global Assembly Cache.

Because Nix stores all components – including library assemblies – in

unique folders in Nix store, this gives some challenges. We cannot use the

Global Assembly Cache (GAC) as it is an impurity in the Nix deployment

process. Hence, if an executable is started from the Nix store, the required

libraries cannot be found, because probing heuristics always look for libraries

in the same base directory as the executable and are never present.

Implementing a wrapper for .NET applications

To allow a .NET executable assembly to ﬁnd its runtime dependencies, we

have implemented a different and more complicated wrapper function. The

wrapper is not a script, but a separate executable assembly that implements

an event handler, which the .NET runtime invokes when a runtime depen-

dency cannot be found and uses the reﬂection API to load library assemblies

from arbitrary Nix store paths. We have based this wrapper on information

provided by the following Knowledge Base article: [Microsoft Support, 2007].

Figure 12.2 shows the layout of the wrapper class, which dynamically loads

all required runtime dependencies and starts the Main() method of the main

executable:

178 The AssemblySearchPaths ﬁeld deﬁnes an array of strings containing Nix

store path names to all library assemblies of the executable.

179 The ExePath ﬁeld refers to the actual executable assembly having the

Main() method.

The Mono runtime (http://mono-project.com) – an alternative .NET implementation

does – by setting the MONO_PATH environment variable, however Mono does not provide com-

patibility for all Microsoft APIs

224

using System;

using System.Reflection;

using System.IO;

namespace HelloWorldWrapper

{

class HelloWorldWrapper

{

private String[] AssemblySearchPaths = {

@"C:\cygwin\nix\store\23ga...-ALibrary",

@"C:\cygwin\nix\store\833p...-BLibrary"

}; 178

private String ExePath =

@"C:\cygwin\nix\store\27f2...-Executable\Executable.exe"; 179

private String MainClassName =

"SomeExecutable.Executable"; 180

static void Main(string[] args) 181

{

new HelloWorldWrapper(args);

}

public HelloWorldWrapper(string[] args) 182

{

...

}

private Assembly MyResolveEventHandler(object sender,

ResolveEventArgs args) 183

{

...

}

Figure 12.2 The layout of the wrapper class written in C# code

180 The MainClassName ﬁeld contains the name of the actual class name hav-

ing the Main() method.

181 This is the Main() method of the wrapper, which creates a new wrapper

instance that sets the assembly resolve event handler and dynamically

loads all the runtime dependencies.

182 This method is the constructor of the HelloWorldWrapper, which purpose

is to conﬁgure the assembly resolve event handler. The implementation

is shown in Figure 12.3.

183 This method deﬁnes our custom assembly resolve handler using the

reﬂection API to dynamically load all required runtime dependencies.

The implementation is shown in Figure 12.4.

Figure 12.3 shows the implementation of the HelloWorldWrapper constructor.

Here, we add our custom MyResolveEventHandler to the AppDomain controller,

then we use the reﬂection API to load the actual executable assembly and

then we execute its main method.

Chapter 12. Supporting Impure Platforms: A .NET Case Study 225

public HelloWorldWrapper(string[] args)

{

// Attach the resolve event handler to the AppDomain

// so that missing library assemblies will be searched

AppDomain currentDomain = AppDomain.CurrentDomain;

currentDomain.AssemblyResolve +=

new ResolveEventHandler(MyResolveEventHandler);

// Dynamically load the executable assembly

Assembly exeAssembly = Assembly.LoadFrom(ExePath);

// Lookup the main class

Type mainClass = exeAssembly.GetType(MainClassName);

// Lookup the main method

MethodInfo mainMethod = mainClass.GetMethod("Main");

// Invoke the main method

mainMethod.Invoke(this, new Object[] {args});

}

Figure 12.3 The implementation of the HelloWorldWrapper constructor

private Assembly MyResolveEventHandler(object sender,

ResolveEventArgs args)

{

// This handler is called only when the common language

// runtime tries to bind to the assembly and fails.

Assembly MyAssembly;

String assemblyPath = "";

String requestedAssemblyName =

args.Name.Substring(0, args.Name.IndexOf(","));

// Search for the right path of the library assembly

foreach(String curAssemblyPath in AssemblySearchPaths)

{

assemblyPath = curAssemblyPath + "/" +

requestedAssemblyName + ".dll";

if(File.Exists(assemblyPath))

break;

}

// Load the assembly from the specified path.

MyAssembly = Assembly.LoadFrom(assemblyPath);

// Return the loaded assembly.

return MyAssembly;

}

Figure 12.4 The implementation of the MyResolveEventHandler method

Figure 12.4 shows the implementation of the MyResolveEventHandler method,

which loads the runtime dependencies through the reﬂection API. This method

basically searches in all directories deﬁnined in the AssemblySearchPaths of the

presence of a required library assembly, then it loads the resulting library

assembly using the reﬂection API.

In this section we have shown a concrete implementation of a wrapper class

for a particular assembly. Normally, this class is generated by a Nix function

226

{dotnetenv, MyAssembly1, MyAssembly2}: 184

dotnetenv.buildSolution { 185

name = "My.Test.Assembly";

src = /path/to/source/code;

slnFile = "Assembly.sln"; 186

assemblyInputs = [ 187

dotnetenv.assembly20Path

MyAssembly1

MyAssembly2

];

}

Figure 12.5 Building a .NET library assembly

called buildWrapper which takes the name of the executable assembly and main

class as additional parameters. This function saves end-users the burden of

developing a wrapper class themselves.

There is only one minor caveat however. The access modiﬁer of the Main()

method in .NET applications is inter nal by default, which means that we cannot

access it from other assemblies. Therefore, we also have to change the access

modiﬁer of the real Main() method to public.

12.4.4 Usage

Packaging library assemblies

Figure 12.5 shows an example of a Nix function invocation used to build a

Visual Studio project:

184 Similar to ordinary Nix expressions, this expression is also a function

taking several input arguments, such as dotnetenv, which provides the

Visual Studio build function (shown in the previous code fragment) and

the library assemblies which are required to build the project.

185 In the body of the function, we call the buildSolution function that we have

deﬁned earlier in Figure 12.1 with the required parameters.

186 The slnFile parameter refers to the path to the Visual Studio solution ﬁle,

which is used to build the component.

187 The assemblyInputs parameter refers to an arbitrary number of library

assemblies that must be used while building the Visual Studio project.

The ﬁrst assembly dotnetenv.assembly20Path refers to the .NET framework

2.0 system assemblies, which are made available to the builder by a Nix

proxy component (similar to the .NET framework). The other assemblies

originate from the function arguments of this expression.

Packaging executable assemblies

The Nix expression in Figure 12.5 is can be used to build a librar y assembly,

but for executable assemblies we require a wrapper, as we have explained ear-

Chapter 12. Supporting Impure Platforms: A .NET Case Study 227

{dotnetenv, MyAssembly1, MyAssembly2}:

dotnetenv.buildWrapper { 188

name = "My.Test.Assembly";

src = /path/to/source/code;

slnFile = "Assembly.sln";

assemblyInputs = [

dotnetenv.assembly20Path

MyAssembly1

MyAssembly2

];

namespace = "TestNamespace"; 189

mainClassName = "MyTestApplication"; 190

mainClassFile = "MyTestApplication.cs"; 191

}

Figure 12.6 Building a .NET executable assembly

lier. By invoking the dotnetenv.buildWrapper function and a number of additional

arguments, we can automatically build a wrapped executable assembly.

Figure 12.6 shows an example of a Nix function invocation producing a

wrapped executable assembly. The structure of this expression is similar to

the previous expression, with the following differences:

188 The dotnetenv.buildWrapper function is invoked which generates the wrap-

per class shown in Figure 12.2 based on a number of given parameters,

instead of the dotnetenv.buildSolution function.

189 The namespace parameter speciﬁes the namespace of the main class.

190 The mainClassName parameter speciﬁes the name of the main class.

191 The mainClassFile speciﬁes the path to the ﬁle containing the main class.

Composing packages

As with ordinary Nix expressions, we also have to compose Visual Studio

components by calling the build function in the previous code fragment with

the right parameters, as shown in Figure 12.7:

192 The dotnetenv attribute is bound to an expression that incorporates the

buildSolution and buildWrapper functions.

193 Here, the build function of the Test Assembly deﬁned in Figure 12.5 is

invoked and called with its required dependencies.

By using the expression in Figure 12.7 and invoking the following command-

line instruction:

$ nix-env -f pkgs.nix -iA MyTestAssembly

The assembly in our example is built from source, including all library

dependencies and the output is produced in:

/nix/store/ri0zzm2hmwg01w2wi0g4a3rnp0z24r8p-My.Test.Assembly

228

rec {

dotnetfx = ...

stdenv = ...

dotnetenv = import ../dotnetenv { 192

inherit stdenv dotnetfx;

};

MyAssembly1 = import ../MyAssembly1 {

inherit dotnetenv;

};

MyAssembly2 = import ../MyAssembly1 {

inherit dotnetenv;

};

MyTestAssembly = import ../MyTestAssembly { 193

inherit dotnetenv MyAssembly1 MyAssembly2;

};

}

Figure 12.7 pkgs.nix: A partial Nix expression composing .NET packages

12.5 D E P L O Y I N G . N E T S E RV I C E S W I T H D I S N I X

Supporting local deployment of .NET packages through Nix is an essential

precondition to be able to deploy .NET services with Disnix. As with the

Nix package manager, Disnix can also be used on Windows through Cygwin.

However, we must also be able to support the deployment of .NET services on

.NET infrastructure components, such as Microsoft SQL server and Internet

Information Services (IIS).

To support these new types of services, we have developed the following

activation scripts:

• mssql-database. This activation script loads a schema on initial startup if

the database does not exists. It uses the OSQL.EXE [Visual C# tutorials,

2011] tool included with Microsoft SQL server, to automatically execute

the SQL instructions from shell scripts to check whether the database

exists and to create the tables if needed.

• iis-webapplication. This activation script activates or deactivates a web ap-

plication on Microsoft Internet Information Services. It uses the MSDe-

ploy.exe [Microsoft, 2010c] tool to automatically activate a web applica-

tion or deactivate a web application.

12.5.1 Porting the StaffTracker example

As an example case, we have ported the StaffTracker, a motivating example

shown in Chapter 5, from Java to .NET technology, using C# as an implemen-

tation language, ADO.NET as database manager, WCF to implement web

services and ASP.NET to create the web application front-end.

Apart from porting the code, we have to modify the deployment speciﬁ-

cations as well. For example, we have to specify Disnix expressions for each

Chapter 12. Supporting Impure Platforms: A .NET Case Study 229

{dotnetenv}:

{zipcodes}:

dotnetenv.buildSolution {

name = "ZipcodeService";

src = ../../../../services/webservices/ZipcodeService;

baseDir = "ZipcodeService";

slnFile = "ZipcodeService.csproj";

targets = "Package";

preBuild = ’’

sed -e ’s|.\SQLEXPRESS|

${zipcodes.target.hostname}\SQLEXPRESS|’ \

-e ’s|Initial Catalog=zipcodes|Initial catalog=${zipcodes.name}|’ \

-e ’s|User ID=sa|User ID=${zipcodes.target.msSqlUsername}|’ \

-e ’s|Password=admin123$|Password=${zipcodes.target.msSqlPassword}|’ \

Web.config

’’;

}

Figure 12.8 Disnix expression for the ZipcodeService service component

service component utilising our .NET deployment infrastructure described in

Section 12.4.

Figure 12.8 shows the Disnix expression for the ZipcodeService part of the

StaffTracker architecture shown in Chapter 5. As with ordinary Disnix expres-

sions, the expression is a nested function taking the intra-dependencies and

inter-dependencies. In the body, we invoke the dotnetenv.buildSolution function

described in Section 12.4 to build a .NET web service.

Apart from porting all Disnix expressions, we must also update the Disnix

services model and change the service types of the web services and web

applications to iis-webapplication so that they are deployed to a IIS web ser ver

and the databases to mssql-database so that they are deployed to a Microsoft

SQL server instance. By running the following command-line instruction,

all the ser vices of the .NET StaffTracker system are built from source code,

transferred to the right machines in the network and activated (a result is

shown in Figure 12.9):

$ disnix-env -s services.nix -i infrastructure.nix \

-d distribution.nix

12.6 E X P E R I E N C E

Apart from the StaffTracker example, we have tried applying the .NET deploy-

ment functions on the Philips medical imaging platform.

The platform was previously deployed by a custom implemented build

system, which builds Visual Studio projects in a predeﬁned sequential order.

Moreover, before the actual build process starts, a number of template source

ﬁles in the entire source tree are modiﬁed.

As a consequence, the exact dependencies among the Visual Studio projects

were not explicitly known. Furthermore, it is also difﬁcult to perform correct

builds of components separately and in parallel, to save build time. Rebuilds

are also expensive as they were performed by processing all Visual Studio

230

Figure 12.9 The .NET StaffTracker deployed in a Windows network using Disnix

projects and check which ﬁles have changed, which may still take a signiﬁcant

amount of time.

In order to utilise any of the tools in our reference architecture for dis-

tributed software deployment, we must model all the components and their

build instructions in Nix expressions, which is complicated, as we have to

ﬁnd out what the exact dependencies are and how the projects can be isolated

(they may have cross-references to ﬁles in other projects as well as implicit as-

sumptions on the environment) and labourious as the platform is very large

and consists of many components. Making the transition to the Nix package

manager cannot be done in a straight forward manner

. Furthermore, because

of the size and complexity of the platform it is also not doable to make all the

changes to improve the deployment process on the entire platform in a short

time window.

To improve the deployment process of the medical imaging perform, we

have implemented a new custom build system utilising MSBuild, providing

some degree of compatibility with the old build system, in which we grad-

ually incorporate techniques described in this thesis, such as a dependency

based build system and build results which are stored in separate directories.

Although they are isolated, they are not guaranteed to be pure and they only

distinguish variants on a selected number of attributes instead of hash codes.

Furthermore, the system also automatically extracts dependencies from Vi-

sual Studio projects, such as the referenced ﬁles and library assemblies, so that

a minimal amount of manual speciﬁcation is needed by the developers. Other

This is not entirely true. We can also consider the entire platform as a single component, but

this does not give end-users any beneﬁts in using our tools. The advantages manifest themselves

if the system that is being deployed is properly componentised.

Chapter 12. Supporting Impure Platforms: A .NET Case Study 231

types of dependencies can be annotated with custom tags in the Visual Studio

solution ﬁles. Also, the architectural changes are gradually implemented to

remove implicit references and cross-references between projects. Currently,

we are capable of deploying various subsets of the platform using the Nix

package manager, such as several utility programs.

12.7 D I S C U S S I O N

In this chapter, we have explained how we can deploy .NET components with

Nix and Disnix, by implementing a Nix convenience function capable of per-

forming Visual Studio project builds and which has the ability to ﬁnd the

speciﬁed required run-time dependencies. We have also demonstrated how

isolated executable assemblies can ﬁnd their run-time dependencies by means

of a wrapper. In order to make it work in a strict deployment as Nix, we have

to make certain compromises, by means of proxy components, because certain

build-time dependencies are tightly integrated to the host system and must

be deployed by other means.

The deployment functions described in this chapter have a number of limi-

tations. The ﬁrst limitation is the lack of dependency completeness. Although

most components are captured in a closure, we also have proxy components

(such as the .NET framework) referring to components residing outside the

Nix store. These dependencies are not included when transferring a closure

to another machine. However, the .NET framework is a common package in-

cluded with most recent of versions of Microsoft Windows, so this should not

impose any serious problems.

Second, native Windows applications do not implement the same system

features as UNIX applications and cannot be jailed with the chroot() system

call, which may make certain package builds impure if they reference tools

outside the Nix store. Therefore, developers must manually ensure that builds

have no dependencies on components outside the Nix store. Furthermore,

even if were able to chroot, we cannot use it, as proxy components still must

be able to reference tools on the host system.

As explained earlier, Disnix is only designed for deployment of service

components and does not manage the underlying infrastructure components.

To provide complimentary infrastructure deployment, we have developed

DisnixOS as described in Chapter 6. However, we cannot use DisnixOS to

deploy Windows machines as the system services and infrastructure compo-

nents, such as Microsoft SQL server, cannot be isolated in the Nix store due to

their proprietary nature. Infrastructure deployment must be done manually

or by other deployment means.

As a ﬁnal note, it pays off to have a close look at Szyperski’s deﬁnition

of component software [Szyperski et al., 2002] and to take these into account

while designing a system. The ﬁrst important aspect is that all explicit context

dependencies must be known, which was sometimes not the case with the

medical imaging platform components (implemented as Visual Studio pro-

jects). Sometimes libraries could be implicitly found in global system direc-

232

tories (e.g. C:\WINDOWS\System32), which may allow a build to succeed even

though a dependency has not been speciﬁed. Invoking these Visual Studio

builds in Nix resulted in build errors, because of unspeciﬁed dependencies.

Furthermore, there were also a number of cross-references between ﬁles in

various Visual Studio projects that are (in principle) external dependencies

that were not explicitly speciﬁed.

Another important aspect is that every component must be treated as a unit

of independent deployment. Without explicitly knowing the dependencies of

a component, it is very unlikely that a particular component can be safely and

independently deployed on an external machine. Finally, every component

is in principle subject to composition by third parties, which requires imple-

menters to implement means to specify or replace dependencies at build and

run-time. A number of projects had implicit assumptions on the locations of

their dependencies, which may yield problems if a component is deployed

elsewhere, or a dependency had to be replaced by a different version or vari-

ant.

Properly componentising a system in advance saves the burden of imple-

menting expensive refactorings on the architecture of the system to improve

the deployment process.

12.8 C O N C L U S I O N

In this chapter, we have shown how we can deploy .NET applications on

the Microsoft Windows platform. Systems depending using .NET technology

are (in principle) impure as we cannot completely deploy all its components

through the Nix package manager, such as the .NET framework.

In order to be able to use our techniq ues, we had to make a small number of

compromises by means of proxy components, at the expense of dependency

completeness. However, the costs are not that high as the the .NET framework

is a common dependency present on many installations. Moreover, in order to

allow a .NET executable assembly to ﬁnd its required run-time dependencies,

we had to implement a complex wrapper facility. Although the wrapper is

complicated, its complexity is hidden by the dotnetenv.buildWrapper function that

automatically generates the wrapper class with its required properties.

By using these facilities we can deploy .NET applications with the Nix

package manager, build and test them on Hydra and deploy service oriented

systems using .NET technology with Disnix.

Furthermore, we have described our experiences with migrating a large

code base to the Nix deployment system. A complete transition is not doable

in a short time window, as due to architectural erosion, not all components

can be easily isolated. Therefore, we have developed a custom build system

gradually incorporating relevant build techniques.

Chapter 12. Supporting Impure Platforms: A .NET Case Study 233

234

Part VI

Conclusion

235

Conclusion

In the introduction, we have explained that non-functional requirements on

software systems and various advancements in software engineering have

made the software deployment process – consisting of all activities to make a

software system available for use – very complicated and error prone. For the

latest generation of systems, which are typically offered as services through

the Internet, many new challanges have arisen next to the deployment chal-

lenges that have arisen in the past. Apart from technical challenges also non-

functional deployment requirements must be met, such as reliability, privacy

and the licenses under which components are governed.

As a solution, we have proposed a reference architecture for distributed

software deployment that implements various deployment concerns. This

reference architecture captures properties of a family of domain-speciﬁc de-

ployment tools and can be turned into a concrete architecture for a domain-

speciﬁc deployment tool by creating a tool utilising the concrete components

deﬁned in the reference architecture and by integrating custom components.

In the other chapters of this thesis, we have covered the individual com-

ponents of the reference architecture implementing service and infrastructure

deployment, the most important general concepts and a number of applica-

tions, including an example of a domain-speciﬁc deployment tool.

13.1 S U M M A RY O F C O N T R I B U T I O N S

Each of the chapters in thesis thesis have a number of contributions. We

summarise these contributions below:

• A reference architecture for distributed software deployment, incorpo-

rating components that solve various deployment aspects and utilise a

combination of the layered virtual machines and the purely functional

batch processing architectural patterns, to make deployment reliable,

reproducible and efﬁcient (Chapter 3).

• A toolset to automatically deploy heterogeneous service-oriented sys-

tems in heterogeneous networks in a reliable and reproducible manner

(Chapter 4).

• A modular, extensible toolset incorporating deployment planning algo-

rithms described in the literature, to dynamically map services to ma-

chines, based on their technical and non-functional requirements (Chap-

ter 5).

237

• A framework to reliably and efﬁciently redeploy service-oriented sys-

tems in case of an event that may occur in a network of machines (Chap-

ter 5).

• An infrastructure to efﬁciently instantiate virtual machines and to per-

form distributed system integration tests (Chapter 7).

• An extension and case-study to make distributed service upgrades fully

atomic (Chapter 8).

• A prototype tool to support reliable and reproducible deployment of

mutable software components (Chapter 9).

• A prototype tool to trace build processes and a method to derive how

source artifacts are combined into the resulting executables (Chapter 10).

• An example of a domain-speciﬁc deployment tool for WebDSL applica-

tions (Chapter 11).

• A study of applying our tools in unconventional and impure environ-

ments (Chapter 12).

13.2 E VA L U AT I O N

The central goal of this thesis is to construct a reference architecture for dis-

tributed software deployment, which can be used to realise a concrete archi-

tecture for a domain-speciﬁc deployment tool. We have used the Nix package

manager as a fundamental basis and we have developed a number of addi-

tional components, capturing several deployment concerns, such as services,

the infrastructure and the licenses under which components are governed.

In Chapter 3, we have also listed several important quality attributes that

we want to achieve with this reference architecture. The primary means to

achieve this goal is to use the purely functional batch processing architectural

pattern. In this section, we evaluate how well we have achieved these quality

attributes.

13.2.1 Reliability

As we have seen in Chapter 3, the reliability attribute covers dependency

completeness, immutability, atomicity and recovery.

Dependency completeness

Dependency completeness, which means that all required dependencies must

be present, correct, and the system must be able to ﬁnd them, is very well

covered in both Disnix and DisnixOS (which incorporates charon). In Disnix,

described in Chapter 4, dependency completeness at buildtime is checked

statically, as ser vice build expressions deﬁne both intra-dependencies and

inter-dependencies as function arguments. Properties of these arguments are

238

used in the build process of the service, resulting in an error if any of these

dependencies is omitted.

At runtime, dependency completeness for intra-dependencies is ensured

by using the same mechanisms as the Nix package manager. By scanning for

hash occurrences of dependencies inside components and by ensuring that

these dependencies are deployed ﬁrst.

For inter-dependencies, the generated manifest ﬁle captures all the runtime

dependencies of services and disnix-activate (invoked by the high-level disnix-

env tool) ensures that all the activation and deactivation steps of services are

performed in the right order and never run into a situation that an inter-

dependency is missing. However, we cannot ensure that machines (which are

technically also dependencies of a deployment scenario) are always present.

For mutable components described in Chapter 9, the dependencies are

relatively simple. They require the presence of the container service and

command-line tools capable of interacting with the container. The latter

property is ensured, because Dysnomia is also built through Nix, and the

Dysnomia modules contain absolute references to store paths containing the

command-line tools (Nix ensures that they are always present if Dysnomia is

deployed, as described in Chapter 2).

The presence of running container components cannot be automatically

ensured directly through Dysnomia. In our experiments, we typically derive

container properties from NixOS conﬁgurations and by conﬁguring Dysnomia

in a NixOS module, which ensures that the container components are present

and correct. However, on non-NixOS systems, other means must be used to

ensure that the container components are present and correct.

Immutability

Immutability, the fact that we store all components in isolation and that they

are never modiﬁed imperatively, is achieved due to the fact that interaction

between most of the components in the reference architecture is done through

Nix derivation functions implementing the purely functional batch processing

architectural pattern. As we have seen in Chapter 2, Nix derivation functions

execute procedures in isolated environments, in which many side effects are

removed, and never overwrite or remove ﬁles belonging to another compo-

nent. As a result, these procedures (nearly) always yield the same result,

based on their declared input arguments.

Atomicity

Atomicity is a property that ensures that we either have the old conﬁguration

or the new conﬁguration, but never an inconsistent mix of the two.

Local atomicity is a direct consequence of using the purely functional batch

processing architectural pattern, as their outputs are stored in a purely fun-

cional cache, such as the Nix store or the Dysnomia snapshot hierarchy. Their

ﬁle names are always unique and thus there is never a situation in which a

component has ﬁles belonging to one version and another.

Chapter 13. Conclusion 239

Full distributed atomicity cannot be achieved through this architectural

pattern, but must be implemented in the deployment tools, e.g., Disnix and

Charon, and several properties must be implemented in the application that

is being deployed, as we have seen in Chapter 8. Implementing application

speciﬁc upgrade modiﬁcations is not always trivial, for example for web ap-

plications and stateless connections.

So far, we have only experimented with full distributed atomic upgrades on

service level and we have implemented a simple prototype example case sup-

porting these kind of upgrades. However, applying fully distributed atomic

upgrades in general is hard to achieve – we require cooperation with the sys-

tem that is being deployed.

Recovery

Recovery is the ability to repair a deployed system if a machine with required

dependencies disappear. As with atomicity, recovery is also a property that

must be implemented in tools, such as Disnix and Charon. In Chapter 5, we

have shown a solution that automatically redeploys a service-oriented system

in case of an event using Disnix. In the evaluation results, we have seen that

for most types of events, the downtime is very minimal and suggests that our

deployment model works quite well for this purpose.

However, our recovery approach only works on service level and not on

infrastructure level. For example, system services are not reconﬁgured if nec-

essary (such as the amount of memory they should consume), which may be

required to do for certain types of events. Also the deployment planning algo-

rithms are limited to simple network topologies and there is no sophisticated

support for state migration.

13.2.2 Reproducibility

Reproducibility, the ability to exactly reproduce a conﬁguration elsewhere, is

also a quality attribute that is ensured by using the purely functional batch

processing architectural pattern. Because side-effects are removed, executing

these functions should always yield the same result, regardless on what ma-

chine the function is executed and assuming that the machine can be trusted.

The mechanisms that Disnix (and Charon) use to remotely deploy com-

ponents are a major example of utilising the reproducibility properties to its

fullest. Moreover, another prominent example is the distributed system inte-

gration test facilities built on top of NixOS (as described in Chapter 7), which

uses sharing of the host systems’ Nix store for efﬁciency.

13.2.3 Genericity

Genericity, being able to support systems using various programming lan-

guages, component technologies and operating systems, is a quality attribute

that is achieved thanks to the fact that arbitrary processes can be invoked

in derivation functions implementing the purely functional batch processing

240

architectural pattern. These processes can be implemented in any program-

ming language, such as C, Java, Haskell, PHP, Perl and Python, and using any

component technology.

We have also seen in Chapter 12, that we can invoke processes hosted out-

side the Nix store, such as MSBuild, used to compile Visual Studio C# projects

by creating proxy components. However, using proxy components reduces

dependency completeness and reproducibility.

Genericity for mutable components is achieved in Dysnomia, that merely

requires interfaces and their semantics implemented in a plugin system. Hence,

we are capable of supporting various mutable components, such as MySQL

databases, Subversion repositories etc.

13.2.4 Extensibility

Extensibility is achieved by exactly the same means as the genericity quality

attribute. For example, in the Dynamic Disnix framework described in Chap-

ter 5, we also invoke deployment planning methods from derivation functions

allowing to invoke tools implemented in any language.

13.2.5 Efﬁciency

Several efﬁciency measures are a direct consequence of using the Nix ex-

pression language and the purely functional batch processing architectural

pattern.

All components in our reference architecture use models implemented in

the Nix expression language, which is a purely functional domain-speciﬁc

language. As a consequence, it supports laziness and only evaluates functions

(with certain function arguments) when it is required. As an optimisation in

purely functional languages, the result of a function invocation can be cached

and therefore it only has to be executed once.

Furthermore, the Nix store serves the purpose as a persistent cache of

derivation function results. As a result, if a speciﬁc component has been

built before (or a conﬁguration has been generated before), we can return the

result from the Nix store instead of evaluating the function again.

Because the dependencies for each process invocation are explicitly known

and these functions have no side effects, we can also easily parallelise tasks

that do not depend on each other.

Dysnomia has a similar persistent cache using a different mechanism and

can use a name based comparison strategy to check whether to execute a

deployment operation. However, in our experiments we have seen that using

the ﬁlesystem as an intermediate layer may be very expensive for large data

sets.

13.3 R E S E A R C H Q U E S T I O N S R E V I S I T E D

Research Question 1

Chapter 13. Conclusion 241

How can we automatically deploy heterogeneous service-oriented systems in a

network of machines in a reliable, reproducible and efﬁcient manner?

As we have seen in Chapter 4, services are a bit different compared to ordi-

nary components. Apart from local dependencies (called intra-dependencies),

they also have dependencies on services which may reside on other machines

in a network (the inter-dependencies).

In Chapter 4, we have described Disnix, as an extension to Nix, in order to

deploy services in a network of machines. As we have seen, Disnix uses three

models each capturing a separate concern to perform the complete deploy-

ment process.

Local reliability, reproducibility and efﬁciency are ensured thanks to the

fact that Disnix builds completely around Nix and implements the purely

functional batch processing pattern. To ensure reliability in a distributed set-

ting, we have ensured that also all inter-dependencies are always present and

activated in the right order.

In Chapter 8, we have described how to make distributed upgrades fully

atomic. Basically, we perform all build actions in a purely functional manner

ﬁrst and then we distribute all components. These steps can be performed in

a non-destructive manner without affecting running systems. In the activa-

tion phase, we ﬁrst notify services, so that they can temporarily block/queue

connections so that end users cannot observe that the system is being up-

graded. However, to achieve full atomicity application speciﬁc changes must

be implemented, such as implementing a proxy.

In Chapter 6 we have described DisnixOS. DisnixOS (that incorporates

charon) uses similar concepts in its deployment procedure as Disnix, includ-

ing a ﬁnal imperative activation phase in which system conﬁgurations are

activated.

As with Disnix, most deployment actions are also non-destructive, depen-

dencies are always complete and only what is necessary gets evaluated, which

are implications of using the purely functional batch processing architectural

pattern.

We have not yet implemented fully atomic distributed upgrades on infras-

tructure level.

Research Question 2

How can we efﬁciently deal with deployment related events occurring in a net-

work of machines?

In Chapter 5, we have shown an extended architecture consisting of a dis-

covery service generating an infrastructure model, a distribution generator,

which generates a distribution model and Disnix which performs redeploy-

ment when an event occurs.

In order to dynamically deploy service-oriented systems, we must generate

a distribution model, in which we map services to machines in the network

based on their technical and non-functional properties.

As we cannot solve this problem in a generic manner, we have implemented

an extensible mechanism in Chapter 5, which can be used to invoke various

242

deployment planning algorithms. This mechanism can be used to develop a

custom strategy that maps services to machines in the network. The extension

mechanism implements the purely functional batch processing architectural

pattern.

In the evaluation results, we have seen that although the overall deploy-

ment times may be long in some cases, the time in which the system is incon-

sistent (the activation phase) is often very short for most types of events. This

suggests that our framework is sufﬁciently effective for practical purposes.

Research Question 3

How can we declaratively perform distributed system integration tests?

To perform distributed system integration tests, we have described an ap-

proach in Chapter 7, which cheaply instantiates networks of virtual machines,

based on the fact that we can share a host system’s Nix store (so that no disk

images have to be generated) and that we can capture all static aspects of an

entire operating system and deploy them in a reliable and reproducible man-

ner. Thanks to these features we can instantiate virtual networks in a matter

of seconds.

Furthermore, by implementing a test driver that remotely executes opera-

tions in these virtual machines, we can write test scripts that can be automat-

ically performed. In Chapter 7, we have described a collection of test cases,

that normally would not be tested automatically at all.

Research Question 4

How can we automatically and reliably deploy mutable software components in

a purely functional deployment model?

In Chapter 9, we have developed Dysnomia, which can be used to automat-

ically deploy mutable components. As Dysnomia only requires an interface, it

cannot ensure reliability directly, although the modules we have implemented

all invoke the proper command-line tools capturing state in a portable and

consistent manner. If plugin developers ensure portability and consistency

for a particular type, it can be considered reliable.

Furthermore, in order to isolate snapshots safely, we have designed a new

naming convention that always produces a unique name. The Dysnomia mod-

ule designer has to map an order attribute to the underlying implementation,

such as a revision number.

The major drawback of Dysnomia is, as we have described earlier, efﬁ-

ciency. We are using the ﬁlesystem as an intermediate layer, which is very

expensive for certain component types.

Research Question 5

How can we determine under which license components are governed and how

they are used in the resulting product?

We have developed a prototype tool in Chapter 10, in which we instrument

Nix build functions with strace, which traces all the system calls invoked dur-

ing a build process. From these traces we can produce a build graph showing

Chapter 13. Conclusion 243

the involved processes and their inputs and outputs. From such graphs we

can also derive which source ﬁles are used and under which licenses these

are governed, by a tool such as ninka. The evaluation results suggest that the

method is highly effective.

Research Question 6

How can we apply the components of the reference architecture?

As a case study, we have developed a domain-speciﬁc deployment tool for

WebDSL in Chapter 11. The tool is composed of a function that produces

a NixOS network speciﬁcation from a high-level speciﬁcation, which can be

used by Charon and the NixOS test driver. To generate a NixOS network

speciﬁcation, we use a number of utility functions and the NixOS module

system to compose various domain-speciﬁc system aspects. One of the major

limitations of our case study is that we do not support state migrations.

In Chapter 12, we have applied Nix and Disnix on the Microsoft Windows

operating system to deploy Visual Studio C# projects. As we have seen, certain

aspects of the system cannot be isolated and deployed in a pure manner.

As a “compromise” we have implemented proxy components referring to

components hosted outside the Nix store. Furthermore, in order to run exe-

cute assemblies from an (isolated) Nix store, we have to implement a complex

wrapper class, capable of loading the required library assemblies.

With these modiﬁcations, we were capable of deploying .NET components

with Nix and Disnix at the slight expense of purity and reproducibility.

However, we have also seen that in order to support large codebases, also

the architecture must be partially re-engineered, which is a non-trivial and

very expensive process. Therefore, it is desirable to develop a (partially) au-

tomated solution capable of deriving the components and their dependencies

and to generate build expressions from it.

13.4 R E C O M M E N D AT I O N S F O R F U T U R E W O R K

Although we have implemented all components in our reference architecture

for distributed software deployment and we have achieved most of our quality

attributes for all of these components, there is room for improvement and

extra features to make deployment processes more convenient.

13.4.1 Network topologies

One of the major limitations of Disnix is that they only support simple net-

work topologies in which machines are directly reachable. Although for most

use cases this is sufﬁcient and there are also other means to cope with com-

plex hierarchies, e.g. by conﬁguring gateways, it limits the use of more ad-

vanced deployment planning algorithms, such as Avala [Mikic-Rakic et al.,

2005]. Therefore, it would be useful to add network topology features to

Disnix, so that these more advanced deployment planning algorithms can be

244

used. Another use case may be to construct tree like network structures, to

make the distribution of large closures more efﬁcient.

Another assumption that we made is that we consider all network connec-

tions stable, but in some environments such as hospitals, certain devices may

only have limited connectivity and may belong to different networks at cer-

tain points in time. Therefore, it may be worth investigating deployment in

ad-hoc networks. An example of such an approach is described by Roussain

and Guidec [2005]. Their approach, however, has a number of drawbacks,

such as the fact that they use an imperative underlying deployment process

with all its disadvantages. As a consequence, it is difﬁcult to guarantee that all

dependencies are present, speciﬁed and correct. Furthermore, their approach

is Java speciﬁc.

13.4.2 Infrastructure-level self-adaptability

In Chapter 5, we have described a framework that redeploys service-oriented

systems in case of an event in the network. In order to allow this approach

to succeed, we always require at least one machine conﬁguration capable of

hosting a particular machine in the network and their machine conﬁgurations

are always static.

Therefore, it may also be desirable to investigate self-adaptibility on in-

frastructure level, e.g., by composing system conﬁgurations based on the re-

quirements of services and by upgrading or limiting the system resources that

certain system services consume.

13.4.3 More declarative system integration testing of service-oriented systems

In some of our case studies, such as SDS2, we have seen that bindings be-

tween services can be ver y fragile and require proper handling of connection

failures. For example, it may be very tempting to write a Java web service

connecting to a database, as described in Figure 13.1.

In the constructor, the database connection is retrieved from the applica-

tion servers’ connection pool, but never released. If at some point in time, a

connection failure occurs, then it will never be restored. As a consequence,

the getRecords() method can never return data and the web service must be re-

deployed in order to make it operational again. For large systems, consisting

of many inter-dependent services this is undesirable, as it requires system ad-

ministrators to redeploy many services, to know what the exact dependencies

are and to do the redeployment in the right order, which may be complicated

and error prone.

Instead, services must be more robust and catch the proper exceptions and

restore the database connection if it has been disconnected. In SDS2, we have

identiﬁed these issues after deploying it in a distributed environment and

running it for a long time and we have implemented proper connection error

handling so that these errors no longer occur.

Chapter 13. Conclusion 245

import java.util.

;

import java.sql.

;

import javax.sql.

;

import javax.naming.

;

public class MyService

{

private Connection connection;

public MyService() throws Exception

{

InitialContext ctx = new InitialContext();

DataSource ds = (DataSource)ctx.lookup("java:comp/env/jdbc/MyDB");

connection = ds.getConnection();

}

public ArrayList<String> getRecords() throws Exception

{

PreparedStatement pstmt =

connection.prepareStatement("select record from records");

ResultSet result = pstmt.executeQuery();

ArrayList<String> list = new ArrayList<String>();

while(result.next())

list.add(result.getString(1));

return list;

}

Figure 13.1 MyService.java: An example of a fragile Java web service

Such behaviour can be automatically detected by our distributed system in-

tegration test facilities described in Chapter 7. However, the service-oriented

testing facilities have a number of drawbacks. First, we must always manually

provide a Disnix infrastructure model and distribution model that properly

deploys all the service components, which may be very labourious and te-

dious to do for a system that is continuously evolving. Second, we must also

manually execute test operations on the right machines in the network.

As a solution, we can use the vertex coloring deployment planning algo-

rithm from the Dynamic Disnix framework (as described in Chapter 5) to gen-

erate a distribution model. Furthermore, we can also annotate services in the

Disnix service model with predicates (command-line instructions that check

for its validity) and a strategy that traverses the inter-dependency closures,

disables and re-enables connections and checks these predicates whether the

service still works. The only thing the end-user has to specify is predicates

that check for validity and a machine conﬁguration capable of running all

types of services. Currently, we have a very basic implementation available,

which is not very mature yet.

13.4.4 Test scalability

Another limitation of the distributed system integration approach is that all

virtual machines are located on one single machine, which is inadequate for

large scale testing. Therefore, it is worth to explore an extended approach

246

in which a network of physical machines hosting virtual machines can be

generated to perform large scale test cases.

13.4.5 Patterns for fully distributed atomic upgrades

As we have seen in Chapter 8, in order to achieve fully atomic upgrades

cooperation with the system that is being deployed is required. So far, we only

have developed a TCP proxy that “drains” connections during the activation

phase, consisting of blocking or queuing incoming connections and waiting

until all already active connections are closed, so that end users are not able

to observe that a system is changing.

For example, to upgrade running web applications, more cooperation is

needed. Apart from draining the connections, we also have to refresh run-

ning web applications (which requires developers to implement a push noti-

ﬁcation mechanism) and eventually migrate state. It would be interesting to

identify such patterns and make these available through a library or another

abstraction facility.

13.4.6 Implementing a license calculus system

In Chapter 10, we have shown how we can construct graphs from dynami-

cally traced build processes, showing the tasks involved and their inputs and

outputs. Currently, licensing issues must still be analysed manually. For ex-

ample, we must manually judge if several combined source ﬁles governed

under different software licenses are compatible and what their resulting li-

cense is. Therefore, it is interesting to construct a license calculus capable of

determining under which license a resulting artifact is governed

A possible solution is to construct a set of rules for each combination of

licenses and link type (which can be determined from the process that uses

ﬁles that are governed under a speciﬁc license) and by rewriting the entire

graph until a license is determined for a resulting build artifact.

13.4.7 More sophisticated state deployment

Although we have implemented a prototype tool called Dysnomia in Chap-

ter 9 that is capable of deploying mutable components, it has various draw-

backs, such as efﬁciency issues. Therefore, it is desirable to investigate this

subject further and eventually develop a better discipline that is more efﬁ-

cient. It has turned out that this problem is very difﬁcult to efﬁciently solve

in a generic manner.

13.4.8 Extracting deployment speciﬁcations from codebases

In Chapter 12, we have implemented an approach capable of automatically

deploying .NET components. However, in order to fully deploy a large code-

This does not mean that such a tool could always judge whether a particular composition of

components is always legal, as only lawyers can do that.

Chapter 13. Conclusion 247

base with our tools, we need deployment speciﬁcations for all components,

which must be manually written. Manually writing deployment speciﬁcations

for large codebases can be very expensive, tedious and labourious.

However, most of the relevant deployment attributes are already captured

in conﬁguration ﬁles of the IDE, such as Eclipse or Visual Studio IDE project

ﬁles. Therefore, it would be interesting to develop an automated approach

capable of generating Nix expressions from IDE project ﬁles to reduce the

amount of maintenance. Some IDEs have integration with build systems,

such as Apache Maven [Apache Software Foundation, 2010]. However, these

tools are restricted to manage build and test processes of components and

not designed to manage the full chain of deployment activities, such as dis-

tributed deployment of entire systems into network of machines. Further-

more, these tools are technology speciﬁc and do not implement a number

of important quality attributes, such as dependency completeness and repro-

ducibility. Therefore, they must be properly invoked from Nix expressions,

which ensures these attributes, which is not always a straight forward process,

as we have explained in Chapter 12.

In the Nix project, we already have two tools capable of extracting Nix ex-

pressions from speciﬁcations in a different format. We have cabal2nix

, which

coverts Haskell package speciﬁcations to Nix expressions and npm2nix

to con-

vert Node.js

package manager speciﬁcations.

13.4.9 Case studies with service composability

One of the missing aspects in supporting service orientation in hospital en-

vironments, as described in Section 1.2, is providing a solution for ﬂexible

service compositions. In order to make a speciﬁc service available for use in

a heterogeneous environment consisting of devices with varying properties

and capabilities, we may have to offer a fat client variant of a service for pow-

erful workstations and a thin client variant for mobile phones. Possibly, also

a number of hybrid variants depending on the amount of system resources

available must be offered.

Although our reference architecture has the facilities to deploy such ser-

vices, there is still a need for a concrete, hands-on case study showing deploy-

ment of variants of the same service. In order to achieve this goal, intensive

cooperation with industry is required.

13.4.10Software deployment research in general

As a ﬁnal note – which is not speciﬁc to this thesis – we also think that the

software deployment discipline in general deserves more attention in the re-

search world. For many subﬁelds in software engineering there are confer-

ences and workshops available, which are specialised in these ﬁelds, such as

https://github.com/NixOS/cabal2nix

https://bitbucket.org/shlevy/npm2nix

http://nodejs.org

248

software language engineering, testing, reverse engineering, repository min-

ing and program comprehension. However, for software deployment there is

no subﬁeld conference available

The papers used throughout these thesis are published at a wide range of

conferences and workshops, such as reliability engineering and self-adaptive

systems. Sometimes using software deployment in a different domain is ap-

preciated, but sometimes it is not and misunderstood. Also the software de-

ployment discipline is relatively unknown in the research world and there

is no consensus yet what good directions are and how to properly evalu-

ate software deployment techniques. Therefore it may be worth to revive

the Working Conference on Component Deployment (CD) or to organise a

software deployment workshop and to investigate the current software de-

ployment practices and to develop new research directions. Recently, a new

workshop on Release Engineering (RELENG)

has been announced that has a

big overleap with software deployment.

There used to be the Working Conference on Component Deployment (CD): http://www.

informatik.uni-trier.de/~ley/db/conf/cd/cd2005.html, which has no longer been

held since 2005.

http://releng.polymtl.ca

Chapter 13. Conclusion 249

250

Bibliography

Abate, P., Cosmo, R. D., Treinen, R., and Zacchiroli, S. (2013). A modular

package manager architecture. Information and Software Technology, 55(2):459

– 474. (Cited on page 61.)

Abate, P., Di Cosmo, R., Boender, J., and Zacchiroli, S. (2009). Strong de-

pendencies between software components. In 3rd International Symposium on

Empirical Software Engineering and Measurement (ESEM) , pages 89–99. (Cited

on page 47.)

Adams, B., Tromp, H., Schutter, K. D., and Meuter, W. D. (2007). Design

recovery and maintenance of build systems. In ICSM, pages 114–123. (Cited

on pages 187 and 199.)

Akkerman, A., Totok, A., and Karamcheti, V. (2005). Infrastructure for Au-

tomatic Dynamic Deployment of J2EE Applications in Distributed Environ-

ments. In CD ’05: Proc. of the 3rd Working Conf. on Component Deployment,

pages 17–32. Springer-Verlag. (Cited on pages 14, 66, and 93.)

Apache Software Foundation (2010). Apache maven. http://maven.

apache.org. (Cited on pages 71 and 248.)

Artho, C., Suzaki, K., Di Cosmo, R., Treinen, R., and Zacchiroli, S. (2012).

Why do software packages conﬂict? In 9th IEEE Working Conference on Mining

Software Repositories (MSR), pages 141–150. (Cited on page 47.)

Attariyan, M. and Flinn, J. (2008). Using causality to diagnose conﬁguration

bugs. In 2008 USENIX Annual Technical Conference (USENIX ’08). USENIX.

(Cited on page 199.)

Baresi, L. and Pasquale, L. (2010). Live goals for adaptive service composi-

tions. In Proc. of the 2010 ICSE Workshop on Software Engineering for Adaptive

and Self-Managing Systems, SEAMS ’10, pages 114–123, New York, NY, USA.

ACM. (Cited on page 116.)

Bass, L., Clements, P., and Kazman, R. (2003). Software Architecture in Practice.

Addison Wesley, second edition. (Cited on pages 15 and 49.)

Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W.,

Fowler, M., Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick,

B., Martin, R. C., Mellor, S., Schwaber, K., Sutherland, J., and Thomas, D.

(2001). Manifesto for agile software development. Available at http://

www.agilemanifesto.org. (Cited on page 13.)

251

Begnum, K. (2006). Managing large networks of virtual machines. In LISA’06:

Proc. of the 20st Conference on Large Installation System Administration Con-

ference, pages 205–214, Berkeley, CA, USA. USENIX. (Cited on pages 127

and 153.)

Bravenboer, M., Kalleberg, K. T., Vermaas, R., and Visser, E. (2008). Strate-

go/XT 0.17. A language and toolset for program transformation. Science of

Computer Programming, 72(1-2):52–70. Special issue on experimental software

and toolkits. (Cited on page 208.)

Bravenboer, M. and Visser, E. (2004). Concrete syntax for objects: domain-

speciﬁc language embedding and assimilation without restrictions. In OOP-

SLA ’04: Proceedings of the 19th annual ACM SIGPLAN conference on Object-

oriented programming, systems, languages, and applications, pages 365–383, New

York, NY, USA. ACM. (Cited on page 208.)

Brélaz, D. (1979). New methods to color the vertices of a graph. Commun.

ACM, 22(4):251–256. (Cited on page 112.)

Build Audit (December 6, 2004). Build Audit. http://buildaudit.

sourceforge.net. (Cited on page 199.)

Buildbot project (2012). Buildbot manual. Available at http:

//buildbot.net/buildbot/docs/current/manual/index.html.

(Cited on page 42.)

Burgess, M. (1995). Cfengine: a site conﬁguration engine. Computing Systems,

8(3). (Cited on pages 60, 122, 123, 152, and 178.)

Camara, J., Canal, C., and Salaun, G. (2009). Behavioural self-adaptation

of services in ubiquitous computing environments. In Proc. of the 2009

ICSE Workshop on Software Engineering for Adaptive and Self-Managing Sys-

tems, pages 28–37, Washington, DC, USA. IEEE Computer Society. (Cited

on page 116.)

Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., John-

ston, J., and Hartman, J. H. (2007). Stork: package management for dis-

tributed VM environments. In LISA’07: Proceedings of the 21st conference on

Large Installation System Administration Conference, pages 1–16, Berkeley, CA,

USA. USENIX. (Cited on page 152.)

Caron, E., Chouhan, P. K., and Dail, H. (2006). GoDIET: A Deployment

Tool for Distributed Middleware on Grid 5000. Technical Report RR-5886,

Laboratoire de l’Informatique du Parallélisme (LIP). (Cited on pages 14, 66,

and 94.)

Carzaniga, A., Fuggetta, A., Hall, R. S., Heimbigner, D., van der Hoek, A.,

and Wolf, A. L. (1998). A characterization framework for software deploy-

ment technologies. Technical Report CU-CS-857-98, University of Colorado.

(Cited on page 3.)

252

Caseau, Y. (2005). Self-adaptive middleware: Supporting business process

priorities and service level agreements. Advanced Engineering Informatics,

19(3):199 – 211. (Cited on page 116.)

Chan, K. S. M. and Bishop, J. (2009). The design of a self-healing compo-

sition cycle for web services. In Proc. of the 2009 ICSE Workshop on Software

Engineering for Adaptive and Self-Managing Systems, pages 20–27, Washington,

DC, USA. IEEE Computer Society. (Cited on page 116.)

Chen, H., Yu, J., Chen, R., Zang, B., and Yew, P.-C. (2007). Polus: A powerful

live updating system. In Proceedings of the 29th international conference on

Software Engineering, ICSE ’07, pages 271–281, Washington, DC, USA. IEEE

Computer Society. (Cited on page 166.)

Chen, X. and Simons, M. (2002). A component framework for dynamic re-

conﬁguration of distributed systems. In CD ’02: Proc. of the IFIP/ACM Work-

ing Conf. on Component Deployment, pages 82–96. Springer-Verlag. (Cited on

pages 66 and 94.)

Cheng, B. H., Giese, H., Inverardi, P., Magee, J., de Lemos, R., Andersson, J.,

Becker, B., Bencomo, N., Brun, Y., Cukic, B., Serugendo, G. D. M., Dustdar,

S., Finkelstein, A., Gacek, C., Geihs, K., Grassi, V., Karsai, G., Kienle, H.,

Kramer, J., Litoiu, M., Malek, S., Mirandola, R., Müller, H., Park, S., Shaw,

M., Tichy, M., Tivoli, M., Weyns, D., and Whittle, J. (2008). 08031 – soft-

ware engineering for self-adaptive systems: A research road map. In Cheng,

B. H. C., de Lemos, R., Giese, H., Inverardi, P., and Magee, J., editors, Soft-

ware Engineering for Self-Adaptive Systems, number 08031 in Dagstuhl Seminar

Proceedings, Dagstuhl, Germany. Schloss Dagstuhl - Leibniz-Zentrum fuer

Informatik, Germany. (Cited on page 115.)

Cicchetti, A., Ruscio, D., Pelliccione, P., Pierantonio, A., and Zacchiroli, S.

(2010). A model driven approach to upgrade package-based software sys-

tems. In Evaluation of Novel Approaches to Software Engineering, volume 69,

pages 262–276. Springer Berlin Heidelberg. (Cited on pages 48 and 178.)

Cicchetti, A., Ruscio, D. D., Pelliccione, P., Pierantonio, A., and Zacchiroli,

S. (2009). Towards a model driven approach to upgrade complex software

systems. In 4th International Conference on Evaluation of Novel Approaches to

Software Engineering (ENASE 2009). (Cited on page 126.)

Coetzee, D., Bhaskar, A., and Necula, G. (2011). apmake: A reliable parallel

build manager. In 2011 USENIX Annual Technical Conference (USENIX ’11).

USENIX. (Cited on page 200.)

Costanza, P. (2002). Dynamic Replacement of Active Objects in the Gilgul

Programming Language. In CD ’02: Proc. of the IFIP/ACM Working Conf. on

Component Deployment, pages 125–140. Springer-Verlag. (Cited on pages 66

and 94.)

Bibliography 253

de Jonge, M. (2005). Build-level components. IEEE Transactions on Software

Engineering, 31(7):588–600. (Cited on page 185.)

de Jonge, M., van der Linden, W., and Willems, R. (2007). eServices for

Hospital Equipment. In Krämer, B., Lin, K.-J., and Narasimhan, P., editors,

5th Intl. Conf. on Service-Oriented Computing (ICSOC 2007), pages 391 – 397.

(Cited on pages 66 and 101.)

Dearle, A. (2007). Software deployment, past, present and future. In FOSE

’07: 2007 Future of Software Engineering, pages 269–284, Washington, DC,

USA. IEEE Computer Society. (Cited on page 173.)

Deb, D., Oudshoorn, M. J., and Paxton, J. (2008). Self-managed deployment

in a distributed environment via utility functions. In 20th International Con-

ference on Software Engineering and Knowledge Engineering (SEKE ’08). (Cited

on page 116.)

den Breejen, W. (2008). Managing state in a purely functional deployment

model. Master’s thesis, Faculty of Science, Utrecht University, The Nether-

lands. (Cited on page 178.)

Di Cosmo, R. and Vouillon, J. (2011). On software component co-

installability. In Proceedings of the 19th ACM SIGSOFT symposium and the

13th European conference on Foundations of software engineering, ESEC/FSE ’11,

pages 256–266, New York, NY, USA. ACM. (Cited on page 47.)

Di Cosmo, R., Zacchiroli, S., and Trezentos, P. (2008). Package upgrades in

foss distributions: details and challenges. In Proceedings of the 1st International

Workshop on Hot Topics in Software Upgrades, HotSWUp ’08, pages 7:1–7:5,

New York, NY, USA. ACM. (Cited on page 47.)

Dijkstra, E. W. (1959). Communication with an Automatic Computer. PhD thesis,

University of Amsterdam. (Cited on page 4.)

Dolstra, E. (2005a). Efﬁcient upgrading in a purely functional compo-

nent deployment model. In Proceedings of the 8th international conference on

Component-Based Software Engineering, CBSE’05, pages 219–234, Berlin, Hei-

delberg. Springer-Verlag. (Cited on page 30.)

Dolstra, E. (2005b). Secure sharing between untrusted users in a transparent

source/binary deployment model. In Proceedings of the 20th IEEE/ACM inter-

national Conference on Automated software engineering, ASE ’05, pages 154–163,

New York, NY, USA. ACM. (Cited on page 33.)

Dolstra, E. (2006). The Purely Functional Software Deployment Model. PhD

thesis, Faculty of Science, Utrecht University, The Netherlands. (Cited on

pages iv, 4, 15, 23, and 73.)

Dolstra, E., Bravenboer, M., and Visser, E. (2005). Service conﬁguration man-

agement. In SCM ’05: Proceedings of the 12th international workshop on Software

254

conﬁguration management, pages 83–98, New York, NY, USA. ACM. (Cited on

page 37.)

Dolstra, E. and Löh, A. (2008). NixOS: A purely functional Linux distribu-

tion. In ICFP 2008: 13th ACM SIGPLAN Intl. Conf. on Functional Programming.

ACM. (Cited on pages 15, 30, and 35.)

Dolstra, E., Löh, A., and Pierron, N. (2010). NixOS: A purely functional

Linux distribution. Journal of Functional Programming, 20. (Cited on pages 15,

35, and 38.)

Dolstra, E., Vermaas, R., and Levy, S. (2013). Charon: Declarative provi-

sioning and deployment. In 1st International Workshop on Release Engineering

(RELENG). IEEE Computer S ociety. To appear. (Cited on pages 122 and 130.)

Dolstra, E. and Visser, E. (2008). The Nix Build Farm: A Declarative Ap-

proach to Continuous Integration. In Mens, K., van den Brand, M., Kuhn, A.,

Kienle, H. M., and Wuyts, R., editors, International Workshop on Advanced Soft-

ware Development Tools and Techniques (WASDeTT 2008). (Cited on page 41.)

Dolstra, E., Visser, E., and de Jonge, M. (2004). Imposing a memory manage-

ment discipline on software deployment. In Proc. 26th Intl. Conf. on Software

Engineering (ICSE 2004), pages 583–592. IEEE Computer Society. (Cited on

pages 15, 23, 73, and 192.)

Dumitras, T., Tan, J., Gho, Z., and Narasimhan, P. (2007). No more Hot-

Dependencies: toward dependency-agnostic online upgrades in distributed

systems. In HotDep’07: Proc. of the 3rd workshop on Hot Topics in System De-

pendability, page 14, Berkeley, CA, USA. USENIX Association. (Cited on

page 94.)

Edgewall Software (2009). Trac – integrated SCM & project management.

http://trac.edgewall.org/. (Cited on page 151.)

Eilam, T., Kalantar, M., Konstantinou, A., Paciﬁci, G., Pershing, J., and

Agrawal, A. (2006). Managing the conﬁguration complexity of distributed

applications in internet data centers. Communications Magazine, IEEE,

44(3):166 – 177. (Cited on pages 15, 60, 94, 95, and 178.)

El Maghraoui, K., Meghranjani, A., Eilam, T., Kalantar, M., and Konstanti-

nou, A. V. (2006). Model driven provisioning: bridging the gap between

declarative object models and procedural provisioning tools. In Proceedings

of the ACM/IFIP/USENIX 2006 International Conference on Middleware, Middle-

ware ’06, pages 404–423, New York, NY, USA. Springer-Verlag New York,

Inc. (Cited on pages 60 and 95.)

Feldman, S. I. (1979). Make—a program for maintaining computer programs.

Software—Practice and Experience, 9(4):255–65. (Cited on pages 61 and 184.)

Fischer, J., Majumdar, R., and Esmaeilsabzali, S. (2012). Engage: a deploy-

ment management system. In PLDI, pages 263–274. (Cited on page 95.)

Bibliography 255

Foster-Johnson, E. (2003). Red Hat RPM Guide. John Wiley & Sons. Also at

http://fedora.redhat.com/docs/drafts/rpm-guide-en/. (Cited

on pages 7, 23, 124, and 189.)

Fowler, M. and Foemmel, M. (2005). Continuous integration. http://www.

martinfowler.com/articles/continuousIntegration.html. Ac-

cessed 11 August 2005. (Cited on page 150.)

Free Software Foundation (2012a). What is copyleft? Available at http:

//www.gnu.org/copyleft. (Cited on page 7.)

Free Software Foundation (2012b). What is free software? http://www.

gnu.org/philosophy/free-sw.html. (Cited on page 7.)

FreeBSD project (2012). About freebsd ports. Available at http://www.

freebsd.org/ports. (Cited on page 46.)

Freedesktop Project (2010a). Avahi. http://avahi.org. (Cited on

page 105.)

Freedesktop Project (2010b). freedesktop.org - software/dbus. http://

www.freedesktop.org/wiki/Software/dbus. (Cited on page 90.)

Freeman, S., Mackinnon, T., Pryce, N., and Walnes, J. (2004). Mock roles, not

objects. In Vlissides , J. M. and Schmidt, D. C., editors, Companion to the 19th

Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems,

Languages, and Applications (OOPSLA 2004), pages 236–246. ACM. (Cited on

page 153.)

Garlan, D., Cheng, S.-W., Huang, A.-C., Schmerl, B., and Steenkiste, P. (2004).

Rainbow: architecture-based self-adaptation with reusable infrastructure.

Computer, 37(10):46–54. (Cited on page 117.)

German, D. M. and Hassan, A. E. (2009). License integration patterns: Ad-

dressing license mismatches in component-based development. In Proceed-

ings of the 31st International Conference on Software Engineering, ICSE ’09, pages

188–198, Washington, DC, USA. IEEE Computer Society. (Cited on pages 181

and 183.)

German, D. M., Manabe, Y., and Inoue, K. (2010). A sentence-matching

method for automatic license identiﬁcation of source code ﬁles. In Proceed-

ings of the IEEE/ACM international conference on Automated software engineering,

ASE ’10, page 437–446, New York, NY, USA. ACM. (Cited on page 189.)

Germán, D. M., Penta, M. D., and Davies, J. (2010). Understanding and

auditing the licensing of open source software distributions. In ICPC, pages

84–93. (Cited on pages 196 and 199.)

Graphviz (2010). Graphviz. http://www.graphviz.org. (Cited on

page 91.)

256

Grechanik, M., Xie, Q., and Fu, C. (2009). Maintaining and evolving GUI-

directed test scripts. In ICSE ’09: 31st Intl. Conf. on Software Engineering,

pages 408–418, Los Alamitos, CA, USA. IEEE Computer Society. (Cited on

page 141.)

Groenewegen, D. and Visser, E. (2008). Declarative access control for

webdsl: Com bining language integration and separation of concerns. In

Schwabe, D. and Curbera, F., editors, International Conference on Web En-

gineering (ICWE’08), Yorktown Heights, New York, USA. IEEE. (Cited on

page 205.)

Grundy, J., Kaefer, G., Keong, J., and Liu, A. (2012). Guest editors’ introduc-

tion: Software engineering for the cloud. IEEE Software, 29:26–29. (Cited on

page 9.)

Guennoun, K., Drira, K., Wambeke, N. V., Chassot, C., Armando, F., and Ex-

posito, E. ( 2008). A framework of models for QoS-oriented adaptive deploy-

ment of multi-layer communication services in group cooperative activities.

Computer Communications, 31(13):3003 – 3017. (Cited on page 116.)

Hall, R. S., Heimbigner, D., and Wolf, A. L. (1999). A cooperative approach

to support software deployment using the software dock. In ICSE ’99: Proc.

of the 21st Intl. Conf. on Software Engineering, pages 174–183, New York, NY,

USA. ACM. (Cited on pages 14, 61, 94, and 127.)

Hemel, A., Kalleberg, K. T., Vermaas, R., and Dolstra, E. (2011). Finding soft-

ware license violations through binary code clone detection. In Proceedings

of the 8th working conference on Mining Software Repositories, MSR ’11, pages

63–72, New York, NY, USA. ACM. (Cited on page 181.)

Hemel, Z., Kats, L. C. L., Groenewegen, D. M., and Visser, E. (2010). Code

generation by model transformation: a case study in transformation modu-

larity. Software and System Modeling, 9(3):375–402. (Cited on page 205.)

Hemel, Z., Verhaaf, R., and Visser, E. (2008). Webworkﬂow: An object-

oriented workﬂow modeling language for web applications. In Czar necki,

K., editor, International Conference on Model Driven Engineering Languages and

Systems (MODELS’ 08), Lecture Notes in Computer Science. Springer. (Cited

on page 205.)

Hewson, J. A., Anderson, P., and Gordon, A. D. (2012). A declarative ap-

proach to automated conﬁguration. In Proceedings of the 26th international

conference on Large Installation System Administration: strategies, tools, and tech-

niques, lisa’12, pages 51–66, Berkeley, CA, USA. USENIX Association. (Cited

on page 127.)

Heydarnoori, A. and Mavaddat, F. (2006). Reliable deployment of

component-based applications into distributed environments. 3rd Intl. Conf.

on Information Technology: New Generations, 0:52–57. (Cited on pages 108

and 112.)

Bibliography 257

Heydarnoori, A., Mavaddat, F., and Arbab, F. (2006). Deploying loosely cou-

pled, component-based applications into distributed environments. IEEE In-

ternational Conference on the Engineering of Computer-Based Systems, 0:93–102.

(Cited on pages 108 and 112.)

Hibler, M., Ricci, R., Stoller, L., Duerig, J., Guruprasad, S., Stack, T., Webb,

K., and Lepreau, J. (2008). Large-scale virtualization in the Emulab network

testbed. In 2008 USENIX Annual Technical Conference, pages 113–128, Berke-

ley, CA, USA. USENIX. (Cited on page 153.)

Hudak, P. (1989). Conception, evolution, and application of functional pro-

gramming languages. ACM Comput. Surv., 21(3):359–411. (Cited on pages 15

and 23.)

ifrOSS (2004–2009). GPL court decisions in Europe. http://www.ifross.

org/ifross_html/links_en.html#Urteile. (Cited on page 183.)

Kats, L. C. L. and Visser, E. (2010). The Spoofax language workbench. Rules

for declarative speciﬁcation of languages and IDEs. In Rinard, M., editor,

Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented

Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-

21, 2010, Reno, NV, USA, pages 444–463. (Cited on page 208.)

Konstantinou, A. V., Eilam, T., Kalantar, M., Totok, A. A., Arnold, W., and

Snible, E. (2009). An architecture for virtual solution composition and de-

ployment in infrastructure clouds. In Proceedings of the 3rd international work-

shop on Virtualization technologies in distributed computing, VTDC ’09, pages

9–18, New York, NY, USA. ACM. (Cited on pages 60 and 95.)

Kramer, J. and Magee, J. (1990). The evolving philosophers problem:

dynamic change management. IEEE Transactions on Software Engineering,

16(11):1293–1306. (Cited on page 166.)

Kramer, J. and Magee, J. (2007). Self-managed systems: an architectural

challenge. In 2007 Future of Software Engineering, FOSE ’07, pages 259–268,

Washington, DC, USA. IEEE Computer Society. (Cited on page 115.)

Larson, P., Hinds, N., Ravindran, R., and Franke, H. (2003). Improving the

Linux Test Project with kernel code coverage analysis. In Proceedings of the

2003 Ottawa Linux Symposium. (Cited on page 149.)

Laszewski, G. v., Blau, E., Bletzinger, M., Gawor, J., Lane, P., Martin, S., and

Russell, M. (2002). Software, component, and service deployment in compu-

tational grids. In CD ’02: Proc. of the IFIP/ACM Working Conf. on Component

Deployment, pages 244–256. Springer-Verlag. (Cited on page 94.)

Ma, X., Baresi, L., Ghezzi, C., Panzica La Manna, V., and Lu, J. (2011).

Version-consistent dynamic reconﬁguration of component-based distributed

systems. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th Eu-

ropean conference on Foundations of software engineering, ESEC/FSE ’11, pages

245–255, New York, NY, USA. ACM. (Cited on page 166.)

258

MacKenzie, D. (2006). Autoconf, Automake, and Libtool: Autoconf, Automake,

and Libtool. Red Hat, Inc. Also at http://sources.redhat.com/

autobook/autobook/autobook.html. (Cited on pages 26 and 208.)

Malek, S., Medvidovic, N., and Mikic-Rakic, M. (2012). An extensible frame-

work for improving a distributed software system’s deployment architecture.

IEEE Transactions on Software Engineering, 38(1):73–100. (Cited on page 107.)

Mancinelli, F., Boender, J., di Cosmo, R., Vouillon, J., Durak, B., Leroy, X., and

Treinen, R. (2006). Managing the complexity of large free and open source

package-based software distributions. In Proceedings of the 21st IEEE/ACM In-

ternational Conference on Automated Software Engineering, ASE ’06, pages 199–

208, Washington, DC, USA. IEEE Computer Society. (Cited on page 47.)

MaxMind (2010). MaxMind - GeoIP | IP Address Location Technology.

http://www.maxmind.com/app/ip-location. (Cited on page 99.)

McIntosh, S., Adams, B., Nguyen, T. H., Kamei, Y., and Hassan, A. E. (2011).

An empirical study of build maintenance effort. In Proceedings of the 33rd

International Conference on Software engineering, ICSE ’11, pages 141–150, New

York, NY, USA. ACM. (Cited on page 185.)

Medvidovic, N. and Malek, S. (2007). Software deployment architecture and

quality-of-service in pervasive environments. In International workshop on

Engineering of software services for pervasive environments, ESSPE ’07, pages 47–

51, New York, NY, USA. ACM. (Cited on page 107.)

Microsoft (2010a). Global assembly cache. http://msdn.microsoft.

com/en-us/library/yf1d93sz.aspx. (Cited on pages 46 and 220.)

Microsoft (2010b). How the runtime locates assemblies. http://msdn.

microsoft.com/en-us/library/yx7xezcf%28v=vs.110%29.aspx.

(Cited on page 224.)

Microsoft (2010c). Installing iis 7 on windows vista and windows

7. http://learn.iis.net/page.aspx/28/installing-iis-on-

windows-vista-and-windows-7. (Cited on page 229.)

Microsoft (2012). Windows Installer. http://msdn.microsoft.com/en-

us/library/cc185688%28VS.85%29.aspx. (Cited on page 46.)

Microsoft Support (2007). How to load an assembly at runtime that is located

in a folder that is not the bin folder of the application. http://support.

microsoft.com/kb/837908. (Cited on page 224.)

Mikic-Rakic, M., Malek, S., and Medvidovic, N. (2005). Improving Availabil-

ity in Large, Distributed Component-Based Systems Via Redeployment. In

Dearle, A. and Eisenbach, S., editors, Component Deployment, volume 3798 of

Lecture Notes in Computer Science, pages 83–98. Springer Berlin / Heidelberg.

(Cited on pages 115, 116, and 244.)

Bibliography 259

Mikic-Rakic, M. and Medvidovic, N. (2002). Architecture-level support for

software component deployment in resource constrained environments. In

CD ’02: Proc. of the IFIP/ACM Working Conf. on Component Deployment, pages

31–50. Springer-Verlag. (Cited on page 94.)

Miło´s, G., Murray, D. G., Hand, S., and Fetterman, M. A. (2009). Satori: En-

lightened page sharing. In 2009 USENIX Annual Technical Conference, pages

1–15, Berkeley, CA, USA. USENIX. (Cited on page 152.)

Moser, M. and O’Brien, T. (2011). The Hudson Book. Oracle Inc., 1 edition.

(Cited on page 42.)

Muthitacharoen, A., Chen, B., and Mazières, D. (2001). A low-bandwidth

network ﬁle system. In Proceedings of the 18th ACM Symposium on Operating

Systems Principles (SOSP ’01), pages 174–187, Chateau Lake Louise, Banff,

Canada. (Cited on page 178.)

Nadiminti, K., Assuncao, M. D. D., and Buyya, R. (2006). Distributed systems

and recent innovations: Challenges and beneﬁts. InfoNet Magazine, 16(3):1–5.

(Cited on page 157.)

Open Source Initiative (2012). The open source deﬁnition. http://www.

opensource.org/docs/OSD. (Cited on page 7.)

Opscode (2011). Chef - opscode. http://www.opscode.com/chef. (Cited

on pages 60, 122, and 126.)

Oreizy, P., Medvidovic, N., and Taylor, R. N. (1998). Architecture-based run-

time software evolution. In Proceedings of the 20th international conference on

Software engineering, ICSE ’98, pages 177–186, Washington, DC, USA. IEEE

Computer Society. (Cited on pages 115 and 116.)

Pant, R. (2009). Organizing a digital technology department of medium size

in a media company. Available at http://www.rajiv.com/blog/2009/

03/17/technology-department. (Cited on page 13.)

Papazoglou, M. P., Traverso, P., Dustdar, S., and Leymann, F. (2007). Service-

oriented computing: State of the art and research challenges. Computer,

40:38–45. (Cited on pages 8 and 65.)

Pfeiffer, T. (1998). Windows DLLs: Threat or Menace? Available at http:

//drdobbs.com/184410810. (Cited on pages 7 and 220.)

Puppet Labs (2011). Puppet platform: An open source platform for en-

terprise systems management. http://www.puppetlabs.com/puppet/

puppet-platform. (Cited on pages 60, 122, and 126.)

Raymond, E. S. (2003). The Art of Unix Programming. Thyrsus Enter-

prises. Also available at: http://www.faqs.org/docs/artu. (Cited on

page 57.)

260

Red Hat, Inc. (2009). Red Hat Enterprise Linux 5 Virtualization Guide. Red Hat,

Inc., fourth edition. (Cited on page 147.)

Reimer, D., Thomas, A., Ammons, G., Mummert, T., Alpern, B., and Bala, V.

(2008). Opening black boxes: Using semantic information to combat virtual

machine image sprawl. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS

International Conference on Virtual Execution Environments, pages 111–120.

ACM. (Cited on page 153.)

Rigole, P. and Berbers, Y. (2006). Resource-driven collaborative component

deployment in mobile environments. In Proc. of the International Conference on

Autonomic and Autonomous Systems, ICAS ’06, Washington, DC, USA. IEEE

Computer Society. (Cited on page 116.)

Rosen, L. (2004). Open Source Licensing: Software Freedom and Intellectual Prop-

erty Law. Prentice Hall. (Cited on page 199.)

Roussain, H. and Guidec, F. (2005). Cooperative component-based software

deployment in wireless ad hoc networks. In CD ’05: Proc. of the 3rd Work-

ing Conf. on Component Deployment, pages 1–15. Springer-Verlag. (Cited on

pages 94 and 245.)

Rutherford, M. J., Anderson, K. M., Carzaniga, A., Heimbigner, D., and Wolf,

A. L. (2002). Reconﬁguration in the Enterprise JavaBean Component Model.

In CD ’02: Proc. of the IFIP/ACM Working Conf. on Component Deployment,

pages 67–81. Springer-Verlag. (Cited on pages 66 and 93.)

Rutherford, M. J., Carzaniga, A., and Wolf, A. L. (2008). Evaluating test suites

and adequacy criteria using simulation-based models of distributed systems.

IEEE Transactions on Software Engineering, 34(4):452–470. (Cited on pages 14

and 153.)

Schneider, B. (1996). Applied Cryptography. John Wiley & Sons, second edi-

tion. (Cited on page 24.)

Skeen, D. and Stonebraker, M. (1987). A formal model of crash recovery in a

distributed system. In Concurrency control and reliability in distributed systems,

pages 295–317, New York, NY, USA. Van Nostrand Reinhold Co. (Cited on

pages 160 and 161.)

Software Freedom Law Center (2008). BusyBox Developers Agree To End

GPL Lawsuit Against Verizon. http://www.softwarefreedom.org/

news/2008/mar/17/busybox-verizon/. (Cited on page 183.)

Stevens, W. R. and Rago, S. A. (2005). Advanced Programming in the UNIX

Environment. Addison-Wesley, second edition. (Cited on page 140.)

Sudmann, N. P. and Johansen, D. (2002). S oftware deployment using mo-

bile agents. In CD ’02: Proc. of the IFIP/ACM Working Conf. on Component

Deployment, pages 97–107. Springer-Verlag. (Cited on page 94.)

Bibliography 261

Szyperski, C., Gruntz, D., and Murer, S. (2002). Component Software: Beyond

Object-Oriented Programming. Addison Wesley, second edition. (Cited on

pages 6, 221, and 232.)

Tanenbaum, A. S. and Woodhull, A. S. ( 1997). Operating Systems: Design and

Implementation. Prentice Hall, second edition. (Cited on page 5.)

Taylor, R. N., Medvidovic, N., and Dashofy, E. M. (2010). Software Architec-

ture: Foundations, Theory, and Practice. John Wiley & Sons, Inc. (Cited on

pages 15, 49, 50, 56, and 57.)

The JBoss Visual Design Team (2012). Hibernate community documen-

tation. Available at http://docs.jboss.org/hibernate/orm/4.1/

quickstart/en-US/html. (Cited on page 206.)

The Joint Task Force on Computing Curricula (2004). Curriculum guide-

lines for undergraduate degree programs in software engineering. Software

Engineering 2004. (Cited on page 3.)

The Linux Foundation (2010). Linux Standard Base Core Speciﬁcation

4.1. Available at http://refspecs.linux-foundation.org/LSB_4.

1.0/LSB-Core-generic/LSB-Core-generic/book1.html. (Cited on

page 46.)

The Linux Foundation (2012). FHS 3.0 Draft 1. Available

at http://www.linuxfoundation.org/collaborate/workgroups/

lsb/fhs-30-draft-1. (Cited on page 35.)

Traugott, S. and Brown, L. (2002). Why order matters: Turing equivalence in

automated systems administration. In Proceedings of the 16th Systems Admin-

istration Conference (LISA ’02), pages 99–120. USENIX. (Cited on pages 125

and 152.)

Trezentos, P., Lynce, I., and Oliveira, A. L. (2010). Apt-pbo: solving the soft-

ware dependency problem using pseudo-boolean optimization. In Proceed-

ings of the IEEE/ACM international conference on Automated software engineering,

ASE ’10, pages 427–436, New York, NY, USA. ACM. (Cited on page 47.)

Tu, Q. and Godfrey, M. W. (2001). The build-time software architecture view.

In ICSM, pages 398–407. (Cited on page 198.)

Tuunanen, T., Koskinen, J., and Kärkkäinen, T. (2009). Automated soft-

ware license analysis. Automated Software Engineering, 16:455–490. (Cited

on pages 187 and 199.)

Tuunanen, T., Koskinen, J., and Kärkkäinen, T. (2006). Retrieving open source

software licenses. In Damiani, E., Fitzgerald, B., Scacchi, W., Scotto, M., and

Succi, G., editors, Open Source Systems, IFIP Working Group 2.13 Foundation on

Open Source Software, volume 203 of IFIP, pages 35–46. Springer. (Cited on

page 199.)

262

Van den Brand, M. G. T., de Jong, H. A., Klint, P., and Olivier, P. A. (2000).

Efﬁcient annotated terms. Softw. Pract. Exper., 30(3):259–291. (Cited on

page 208.)

van der Burg, S. (2011a). An evaluation and comparison of GoboLinux.

Available at: http://sandervanderburg.blogspot.com/2011/

12/evaluation-and-comparison-of-gobolinux.html. (Cited on

page 48.)

van der Burg, S. (2011b). Deploying .NET applications with the Nix package

manager. Available at: http://sandervanderburg.blogspot.com/

2011/09/deploying-net-applications-with-nix.html. (Cited on

page 220.)

van der Burg, S. (2011c). Deploying .NET applications with the Nix package

manager (part 2). Available at: http://sandervanderburg.blogspot.

com/2011/09/deploying-net-applications-with-nix_14.html.

(Cited on page 220.)

van der Burg, S. (2011d). Deploying .NET services with Disnix. Available

at: http://sandervanderburg.blogspot.nl/2011/10/deploying-

net-services-with-disnix.html. (Cited on page 220.)

van der Burg, S. (2011e). Disnix user’s guide. Available at: http://nixos.

org/disnix/docs.html. (Cited on page 82.)

van der Burg, S. (2011f). Free (and open-source) software. Available

at: http://sandervanderburg.blogspot.com/2011/02/free-and-

open-source-software.html. (Cited on page 7.)

van der Burg, S. (2011g). On Nix, NixOS and the Filesystem Hierarchy Stan-

dard (FHS). Available at: http://sandervanderburg.blogspot.com/

2011/11/on-nix-nixos-and-filesystem-hierarchy.html. (Cited

on page 48.)

van der Burg, S. (2012). A generic approach for deploying and upgrading

mutable software components. In Dig, D. and Wahler, M., editors, Fourth

Workshop on Hot Topics in Software Upgrades (HotSWUp). IEEE Computer So-

ciety. (Cited on page 21.)

van der Burg, S., Davies, J., Dolstra, E., German, D. M., and Hemel, A. (2012).

Discovering software license constraints: Identifying a binary’s sources by

tracing build processes. Technical Report TUD-SERG-2012-010, Software En-

gineering Research Group, Delft, The Netherlands. (Cited on page 21.)

van der Burg, S. and Dolstra, E. (2010a). Automated deployment of a hetero-

geneous service-oriented system. In 36th EUROMICRO Conference on Software

Engineering and Advanced Applications (SEAA). IEEE Computer Society. (Cited

on pages 20, 73, and 127.)

Bibliography 263

van der Burg, S. and Dolstra, E. (2010b). Automating system tests using

declarative virtual machines. In 21st IEEE International Symposium on Software

Reliability Engineering (ISSRE). IEEE Computer Society. (Cited on page 20.)

van der Burg, S. and Dolstra, E. (2010c). Declarative testing and deployment

of distributed systems. Technical Report TUD-SERG-2010-020, Software En-

gineering Research Group, Delft, The Netherlands. (Cited on page 21.)

van der Burg, S. and Dolstra, E. (2010d). Disnix: A toolset for distributed

deployment. In van den Brand, M., Mens, K., Kienle, H., and Cleve, A.,

editors, Third International Workshop on Academic Software Development Tools

and Techniques (WASDeTT-3). WASDeTT. (Cited on pages 20, 73, and 92.)

van der Burg, S. and Dolstra, E. (2011). A self-adaptive deployment frame-

work for service-oriented systems. In 6th International Symposium on Software

Engineering for Adaptive and Self-Managing Systems (SEAMS). ACM. (Cited on

page 20.)

van der Burg, S. and Dolstra, E. (2012). Disnix: A toolset for distributed

deployment. Science of Computer Programming, (0):–. (Cited on page 20.)

van der Burg, S., Dolstra, E., and de Jonge, M. (2008). Atomic upgrading of

distributed systems. In Dumitras, T., Dig, D., and Neamtiu, I., editors, First

ACM Workshop on Hot Topics in Software Upgrades (HotSWUp). ACM. (Cited

on pages 21, 73, and 92.)

van der Burg, S., Dolstra, E., de Jonge, M., and Visser, E. (2009). Software

deployment in a dynamic cloud: From device to ser vice orientation in a hos-

pital environment. In Bhattacharya, K., Bichler, M., and Tai, S., editors, First

ICSE 2009 Workshop on Software Engineering Challenges in Cloud Computing.

IEEE Computer Society. (Cited on page 20.)

van der Hoek, A., Hall, R. S., Heimbigner, D., and Wolf, A. L. (1997). Soft-

ware release management. In Proceedings of the 6th European SOFTWARE

ENGINEERING conference held jointly with the 5th ACM SIGSOFT international

symposium on Foundations of software engineering, ESEC ’97/FSE-5, pages 159–

175, New York, NY, USA. Springer-Verlag New York, Inc. (Cited on page 4.)

Vandewoude, Y., Ebraert, P., Berbers, Y., and D’Hondt, T. (2007). Tranquil-

ity: A low disruptive alternative to quiescence for ensuring safe dynamic

updates. IEEE Transactions on Software Engineering, 33(12):856–868. (Cited on

page 166.)

Vermolen, S. D., Wachsmuth, G., and Visser, E. (2011). Generating database

migrations for evolving web applications. In Proceedings of the 10th ACM

international conference on Generative programming and component engineering,

GPCE ’11, pages 83–92, New York, NY, USA. ACM. (Cited on page 218.)

Visser, E. (1997). Syntax Deﬁnition for Language Prototyping. PhD thesis, Uni-

versity of Amsterdam. (Cited on page 208.)

264

Visser, E. (2008). WebDSL: A case study in domain-speciﬁc language engi-

neering. In Lammel, R., Saraiva, J., and Visser, J., editors, Generative and Trans-

formational Techniques in Software Engineering (GTTSE 2007), Lecture Notes in

Computer Science. Springer. (Cited on pages 20, 58, 203, and 204.)

Visual C# tutorials (2011). Installing sql server 2008 express.

http://visualcsharptutorials.com/2011/05/installing-

sql-server-2008-express. (Cited on page 229.)

Wang, Y., Rutherford, M. J., Carzaniga, A., and Wolf, A. L. (2005). Au-

tomating experimentation on distributed testbeds. In Proceedings of the 20th

IEEE/ACM International Conference on Automated Software Engineering, pages

164–173. ACM. (Cited on page 153.)

Wegner, P. (1976). Research paradigms in computer science. In Proceed-

ings of the 2nd international conference on Software engineering, ICSE ’76, pages

322–330, Los Alamitos, CA, USA. IEEE Computer S ociety Press. (Cited on

page iv.)

Wichadakul, D. and Nahrstedt, K. (2002). A Translation System for Enabling

Flexible and Efﬁcient Deployment of QoS-Aware Applications in Ubiquitous

Environments. In CD ’02: Proc. of the 1st Working Conf. on Component Deploy-

ment, pages 287–310. Springer-Verlag. (Cited on page 116.)

Bibliography 265

266

Samenvatting

E E N R E F E R E N T I E A R C H I T E C T U U R V O O R

G E D I S T R I B U E E R D E S O F T WA R E D E P L O Y M E N T

– Sander van der Burg –

Vandaag de dag zijn veel software systemen groter en complexer dan men

denkt. Om die rede is in de laatste 30 jaar software engineering een belangrij-

ke wetenschappelijke onderzoeksdiscipline geworden binnen de informatica.

Binnen deze onderzoeksdiscipline worden diverse activiteiten die te maken

hebben met het bouwen van software bestudeerd, onderzocht en verbeterd,

waaronder het speciﬁceren van eisen van het systeem, het maken van sys-

teemontwerpen, het schrijven en begrijpen van programmeercode en het tes-

ten van systemen.

Naast het er voor zorgen dat een systeem correct gebouwd wordt en het te

laten doen wat de opdrachtgever heeft gevraagd, moeten ze ook beschikbaar

voor gebruik worden gemaakt aan eindgebruikers of getest worden binnen

een afgesloten testomgeving, zodat kan worden gecontroleerd op juistheid.

Dit proces wordt binnen het software engineering domein software deployment

genoemd, dat kan worden omschreven als alle activiteiten die noodzakelijk

zijn om een systeem beschikbaar te maken voor gebruik. Voorbeelden van

zulke activiteiten zijn het bouwen software componenten, het overbrengen van

componenten van de leverancier naar de klant, het activeren van software com-

ponenten en upgraden. Software deployment is een steeds complexer wordend

probleem en is in het algemeen onderschat door zowel mensen in de onder-

zoekswereld als mensen in de praktijk.

S O F T WA R E D E P L O Y M E N T C O M P L E X I T E I T

De rede dat software deployment zo complex is geworden heeft diverse oor-

zaken. Vroeger toen de eerste digitale computers werden gebouwd, was soft-

ware deployment een niet bestaand probleem. Programma’s werden direct

in machinetaal geschreven voor één speciﬁek soort apparaat. Programmeren

was over het algemeen heel complex en naast het feit dat een programma

correct moest werken, was het ook vereist voor een programmeur precies te

weten hoe een machine werkt en hoe randapparatuur, zoals printers, aange-

stuurd konden worden. Zodra nieuwe computers verschenen werden oude

programma’s weggegooid en opnieuw geschreven.

Om met deze complexiteit om te gaan zijn er diverse nieuwe technieken

ontwikkeld. Er zijn nieuwe hogere programmeertalen ontwikkeld, die dich-

ter staan bij het domein van een probleem, dan de details van de machine.

Ook zijn er besturingssystemen gemaakt die de complexiteit van randappa-

ratuur verbergen, zodat deze eenvoudig vanuit programma’s kunnen worden

267

aangestuurd. Daarnaast zijn programma’s ook niet langer zelfstandig – ze

gebruiken kant-en-klare software componenten die de programmeur diver-

se standaardfunctionaliteit verschaffen, zoals het maken van graﬁsche user-

interfaces, die vensters en knoppen bevatten. Al deze factoren hebben de

productiviteit van programmeurs en de kwaliteit van computerprogramma’s

aanzienlijk verbeterd.

Bovendien worden computerprogramma’s ook niet langer meer speciﬁek

ontwikkeld voor één soort apparaat. Vandaag de dag worden computer pro-

gramma’s als diensten via het Internet beschikbaar gemaakt (zoals Facebook

en Twitter

) en moeten gebruikt kunnen worden op een breed scala van ap-

paraten variërend van bureau computers, tablets tot mobiele telefoons.

Helaas hebben al deze ontwikkelingen ook een negatieve kant. Omdat pro-

gramma’s niet langer zelfstandige objecten zijn, moet men om er zeker van te

zijn dat ze goed werken er ook zeker van zijn dat alle afhankelijkheden aan-

wezig zijn (zoals andere software componenten en de compilers) en correct

zijn. Hoewel dit logisch klinkt, blijkt in de praktijk toch vaak bepaalde com-

ponenten te ontbreken of niet van de juiste versie te zijn, dat er voor zorgt dat

een programma vaak helemaal niet werkt. Daarnaast moeten programma’s

soms ook worden geupgraded, omdat het opnieuw installeren van systemen

vaak tijdrovend is. Vaak zijn deze processen gevaarlijk en destructief – als een

upgrade faalt, dan kan een systeem in een toestand achter blijven waarin het

niet meer bruikbaar is. Daarnaast is het ook lastig (of soms onmogelijk) om

weer terug te gaan naar een oude versie. Naast de moeilijkheden van de de-

ployment activiteiten zelf, worden deze ook vaak handmatig uitgevoerd door

systeembeheerders of ontwikkelaars, dat veel tijd kost en een grote kans op

fouten geeft. Tijdens zulke activiteiten is vaak een systeem tijdelijk volledig

of ten dele niet beschikbaar, wat niet altijd wenselijk is.

Moderne generatie systemen zijn nog complexer dan gewone desktop pro-

gramma’s. Hoewel ze er voor eindgebruikers vaak eenvoudig uitzien, zijn

ze opgebouwd uit componenten gemaakt in verschillende programmeertalen,

die verschillende soorten componenten gebruiken en die zich op verschillende

machines bevinden in een data center of op het Internet. Deze componenten

zijn afhankelijk van elkaar en het ontbreken van een afhankelijkheid kan er toe

leiden dat een systeem niet langer functioneert. Verder kunnen deze compo-

nenten ook geïnstalleerd staan op machines met verschillende eigenschappen,

zoals besturingssystemen en een verschillende hoeveelheid systeembronnen.

Verder zijn er ook allerlei niet-functionele aspecten van belang tijdens het

deployen van moderne generatie systemen. Bijvoorbeeld, als een systeem pri-

vacy gevoelige informatie verschaft, is het vaak niet wenselijk een dergelijk

component te deployen op een plaats waar het algemeen toegankelijk is. Ver-

der kunnen er ook machines in een netwerk uitvallen, wat er toe kan leiden

dat delen van een systeem wegvallen en het niet langer functioneert. Daarom

kan het wenselijk zijn om meerdere redundante instanties te deployen, die

als uitval mogelijkheid kunnen dienen. Een ander belangrijk aspect is dat er

http://www.facebook.com

http://www.twitter.com

268

vaak gebruik wordt gemaakt door software componenten van externe partijen,

zoals vrije en open source software componenten. Hoewel deze vaak koste-

loos beschikbaar zijn, moeten er wel bepaalde licentievoorwaarden opgevolgd

worden. Daarom is het van essentieel belang te weten welke componenten er

gebruikt worden en wat hun voorwaarden beschrijven.

Niet-functionele deploymentaspecten zijn vaak speciﬁek tot een bepaald

domein. Voor het medische domein kan privacy erg belangrijk zijn, terwijl

voor het ﬁnanciële betrouwbaarheid een grote rol speelt.

Tot slot is het vandaag de dag ook wenselijk om snel waarde te kunnen

leveren – veel software producten worden ontwikkeld d.m.v. Agile ontwikke-

lingsmethoden, waarin telkens korte iteraties van ca. 2 tot 4 weken worden

uitgevoerd en waarin aan het eind van een iteratie (sprint) telkens een nieuw

prototype van het product wordt opgeleverd. Om deze te kunnen leveren

of testen is het snel en juist uitvoeren van software deployment activiteiten

cruciaal.

E E N R E F E R E N T I E A R C H I T E C T U U R V O O R G E D I S T R I B U E E R -

D E S O F T WA R E D E P L O Y M E N T

Om alle eerder genoemde redenen is er een volledig geautomatiseerde deploy-

ment oplossing nodig dat software op een betrouwbare, herproduceerbare,

generieke, uitbreidbare en efﬁciënte manier kan deployen. In eerder onder-

zoek gedaan door Eelco Dolstra, is de Nix package manager ontwikkeld, een

software tool dat software componenten geautomatiseerd kan deployen op

een betrouwbare, herproduceerbare, generieke en efﬁciënte wijze. Verder is

Nix een belangrijke basis voor twee andere toepassingen – NixOS is een Linux

distributie dat volledig gedeployed wordt via Nix en gehele systeemconﬁgu-

raties kan uitrollen op basis van een model. Hydra is een bouw en testomge-

ving gebouwd op Nix.

Hoewel Nix en de Nix toepassingen een aantal signiﬁcante voordelen bie-

den t.o.v. conventionele deployment oplossingen, zijn er bepaalde aspecten

van moderne generatie systemen die niet ondersteund worden. Een van de as-

pecten is dat Nix beperkt is tot het beheren van lokale of intra-afhankelijkheden.

Moderne generatie systemen bestaan uit componenten die op verschillende

plaatsen staan die ook afhankelijkheden op elkaar hebben

(inter-afhankelijkheden). Deze moeten ook gemodelleerd en geautomatiseerd

kunnen worden. Verder biedt Nix geen oplossingen om om te kunnen gaan

met niet-functionele eisen, zoals privacy, betrouwbaarheid, het herstellen van

deployment bij een storing, of het nakomen van licentievoorwaarden van ex-

terne componenten.

Om een oplossing te kunnen bieden voor moderne generatie systemen is

een nieuwe geautomatiseerde oplossing nodig dat deze wenselijke kwaliteits-

attributen biedt. Echter zijn moderne generatie systemen zo divers en variëren

niet-functionele eisen zodanig per domein, dat het vrijwel onmogelijk is een

universele oplossing te bieden dat al deze systemen ondersteund. In plaats

daarvan, deﬁniëren wij een referentiearchitectuur voor gedistribueerde deploy-

Samenvatting 269

ment – deze referentiearchitectuur bevat diverse componenten die gedistri-

bueerde software deployment activiteiten automatiseren op een betrouwbare,

herproduceerbare, generieke en efﬁciënte wijze. Bovendien biedt de referen-

tiearchitectuur integratie mogelijkheden voor domein-speciﬁeke componen-

ten die oplossingen bieden voor wenselijke niet-functionele eisen.

De referentiearchitectuur kan door een ontwikkelaar worden getransfor-

meerd in een concrete architectuur voor een geautomatiseerde deployment

tool, dat gewenste functionele en niet-functionele deployment aspecten im-

plementeerd voor een zeker domein. Een belangrijk ingrediënt om er voor

te zorgen dat onze gewenste kwaliteitsattributen worden nagestreefd, gebrui-

ken wij een belangrijke architectuurpatroon dat we purely functional batch proces-

sing noemen, waar mee we de componenten met elkaar integreren. Daarnaast

hebben wij Nix en NixOS in de referentiearchitectuur geïntegreerd dat zorgt

voor de deployment van lokale systemen en een aantal nieuwe componen-

ten ontwikkeld die er voor zorgen dat cruciale gedistribueerde deployment

activiteiten kunnen worden uitgevoerd.

D E A S P E C T E N VA N D E R E F E R E N T I E A R C H I T E C T U U R

Een groot deel van dit proefschrift is gewijd aan nieuwe componenten in de

referentiearchitectuur die diverse software deployment aspecten automatise-

ren, waaronder:

• Disnix (Hoofdstuk 4) is een tool dat Nix uitbreid, door het mogelijk te

maken gedistribueerde componenten te modelleren (inclusief de inter-

afhankelijkheden) en ze automatisch te deployen in een netwerk van ma-

chines met verschillende eigenschappen, op een betrouwbare, herprodu-

ceerbare, generieke en efﬁciënte wijze. Hoofdstuk 8 besteedt aandacht

aan het atomaire upgrade aspect van Disnix, dat ook in andere onderdelen

van de referentiearchitectuur gebruikt kan worden.

• Dynamic Disnix (Hoofdstuk 5) is een generieke extentie bovenop Disnix,

dat het mogelijk maakt componenten dynamisch te deployen op ma-

chines in een netwerk, op basis van hun functionele en niet-functionele

eigenschappen die door gebruikers zelf gedeﬁnieerd kunnen worden.

Daarnaast kan deze extentie ook een systeem automatisch en efﬁciënt

herdeployen, als er een gebeurtenis optreedt, zoals een machine die door

storing wegvalt in een netwerk.

• DisnixOS (Hoofdstuk 6) is een aanvulling op Disnix dat naast gedis-

tribueerde componenten door Disnix, ook netwerken van NixOS sys-

teemconﬁguraties kan deployen op zowel fysieke hardware als virtuele

hardware bij bijvoorbeeld een cloud provider, zoals Amazon EC2, via

een tool genaamd charon.

• NixOS test driver (Hoofdstuk 7) toont een gedistribueerde test methode

op basis van NixOS. Doordat Nix en NixOS herproduceerbaarheid erg

goed kan garanderen, kunnen wij ook efﬁciënt netwerken van virtuele

270

machines maken voor test doeleinden, door componenten die hetzelfde

zijn te delen.

• Dysnomia (Hoofdstuk 9) beschrijft een prototype dat dynamische com-

ponenten kan deployen. Componenten in Nix en Nix toepassingen kun-

nen echter nadat ze gebouwd worden niet meer worden aangepast, om

betrouwbaarheid en herproduceerbaarheid te garanderen. Echter, som-

mige type componenten zoals databases kunnen niet op deze manier

worden beheerd. Dysnomia biedt een oplossing om deze componenten

op betrouwbare, herproduceerbare en generieke wijze te beheren.

• grok-trace (Hoofdstuk 10) beschrijft een prototype dat dynamisch kan

achterhalen welke bestanden en op welke manier ze gebruikt worden

in een bouw proces. In combinatie met een licentie analyse tool kan

worden bepaald of de licenties op de juiste manier worden nageleefd.

De laatste twee hoofdstukken beschrijven hoe deze aspecten kunnen wor-

den toegepast. Hoofdstuk 11 beschrijft een voorbeeld waarin we onderdelen

uit de referentiearchitectuur gebruiken om de deployment van WebDSL appli-

caties te automatiseren, een web ontwikkelomgeving dat wij intern gebruiken

op de TU Delft. Hoofdstuk 12 beschrijft hoe we in ongewone omgevingen,

met ongewone kenmerken, de onderdelen kunnen toepassen zoals een soft-

ware systeem gebruik makend van .NET technologie.

R E S U LTA AT

Uiteindelijk zijn we in staat geweest de onderdelen in de referentiearchitec-

tuur te implementeren en een groot deel van onze gewenste kwaliteitsattribu-

ten te bereiken. Een belangrijke kanttekening is wel dat om deze onderdelen

optimaal te benutten voor een willekeurig systeem, deze wel goed opgedeeld

moet zijn in componenten, die allemaal gemodelleerd moeten zijn.

Verder zijn niet alle onderdelen even volwassen, zijn er openstaande punten

die voor verbetering vatbaar zijn en nieuwe gebieden die onderzocht moeten

worden. Belangrijke openstaande aspecten zijn dat dynamische componenten

niet efﬁciënt gedeployed kunnen worden en dat alleen eenvoudige netwerk

topologieën ondersteund worden.

Samenvatting 271

272

Curriculum Vitae

Sander van der Burg was born on May 24th 1984 in Numansdorp, The Nether-

lands.

E D U C AT I O N

2008 – 2012

PhD Software Engineering

Delft University of Technology

• Dissertation title: “A Reference Architecture for Distributed Software

Deployment”

• Followed courses: Scientiﬁc Writing in English, “The Art of Presenting

Science”, Philips Informatics Infrastructure Basic training.

2005 – 2009

M.Sc. Computer Science

Delft University of Technology

• Specialisation: Software Evolution, Model-driven Software Engineer-

ing

• Free-elective courses: Computer graphics, Speech and language pro-

cessing, Methodology and Ethics

2001 – 2005 B.ICT. Computer Science

Rotterdam University of Applied Sciences

• Specialisation: Software Engineering

• Free elective courses: Linux assembly, Sailing license, Visual Basic,

Astronomy

1996 – 2001 HAVO

RSG Hoeksche Waard

• Languages: Dutch, English, French 1 (speech and listening skills)

• Proﬁle: Nature & Technology

• Free elective courses: Economics 1, Computer Science

W O R K E X P E R I E N C E

2012 – present Software Architect

Conference Compass B.V.

2008 – 2012 PhD candidate

Delft University of Technology

273

• Working on a PhD dissertation titled: “A Reference Architecture for

Distributed Software Deployment”

• Developing software tools related to software deployment and domain-

speciﬁc languages: NixOS, Disnix, WebDSL

• Teaching assistant for the “Concepts of programming languages” lab

course, covering several programming language paradigms, such as

Scala (functional programming), C (low-level) and JavaScript (prototype-

based object orientation)

2009 – 2012 Visiting researcher

Koninklijke Philips Electronics N.V.

Applying deployment research on a medical imaging software platform

2008 Intern Healthcare Systems Architecture

Koninklijke Philips Electronics N.V.

• Development of the ﬁrst prototype of Disnix

• Applying development techniques on an experimental medical system

2002 – 2008 IT specialist

Bosman Watermanagement B.V.

• Web application framework development based on PHP, Apache and

MySQL

• Fat-client Java application development

• Integration of custom applications with an ERP system

• Supporting system administration tasks and management of Linux,

Novell Netware, and Microsoft Windows 2000 machines.

2003 and 2005 Intern Hone/Link ADM

IBM Netherlands B.V.

Developing a system measuring the usage of Java EE applications in an

IBM zSeries mainframe environment for cost distribution.

274

Titles in the IPA Dissertation Series Since 2007

H.A. de Jong. Flexible Heterogeneous Software

Systems. Faculty of Natural Sciences, Mathe-

matics, and Computer Science, UvA. 2007-01

N.K. Kavaldjiev. A run-time reconﬁgurable

Network-on-Chip for streaming DSP applications.

Faculty of Electrical Engineering, Mathematics

& Computer Science, UT. 2007-02

M. van Veelen. Considerations on Model-

ing for Early Detection of Abnormalities in Lo-

cally Autonomous Distributed Systems. Fac-

ulty of Mathematics and Computing Sciences,

RUG. 2007-03

T.D. Vu. Semantics and Applications of Process

and Program Algebra. Faculty of Natural Sci-

ences, Mathematics, and Computer Science,

UvA. 2007-04

L. Brandán Briones. Theories for Model-based

Testing: Real-time and Coverage. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2007-05

I. Loeb. Natural Deduction: Sharing by Presen-

tation. Faculty of Science, Mathematics and

Computer Science, RU. 2007-06

M.W.A. Streppel. Multifunctional Geometric

Data Structures. Faculty of Mathematics and

Computer Science, TU/e. 2007-07

N. Trˇcka. Silent Steps in Transition Systems and

Markov Chains. Faculty of Mathematics and

Computer Science, TU/e. 2007-08

R. Brinkman. Searching in encrypted data. Fac-

ulty of Electrical Engineering, Mathematics &

Computer Science, UT. 2007-09

A. van Weelden. Putting types to good use. Fac-

ulty of Science, Mathematics and Computer

Science, RU. 2007-10

J.A.R. Noppen. Imperfect Information in Soft-

ware Development Processes. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2007-11

R. Boumen. Integration and Test plans for Com-

plex Manufacturing Systems. Faculty of Me-

chanical Engineering, TU/e. 2007-12

A.J. Wijs. What to do Next?: Analysing and Op-

timising System Behaviour in Time. Faculty of

Sciences, Division of Mathematics and Com-

puter Science, VUA. 2007-13

C.F.J. Lange. Assessing and Improving the Qual-

ity of Modeling: A Series of Empirical Studies

about the UML. Faculty of Mathematics and

Computer Science, TU/e. 2007-14

T. van der Storm. Component-based Conﬁgura-

tion, Integration and Delivery. Facult y of Natu-

ral Sciences, Mathematics, and Computer Sci-

ence,UvA. 2007-15

B.S. Graaf. Model-Driven Evolution of Soft-

ware Architectures. Faculty of Electrical Engi-

neering, Mathematics, and Computer Science,

TUD. 2007-16

A.H.J. Mathijssen. Logical Calculi for Reason-

ing with Binding. Faculty of Mathematics and

Computer Science, TU/e. 2007-17

D. Jarnikov. QoS framework for Video Streaming

in Home Networks. Faculty of Mathematics and

Computer Science, TU/e. 2007-18

M. A. Abam. New Data Structures and Algo-

rithms for Mobile Data. Faculty of Mathematics

and Computer Science, TU/e. 2007-19

W. Pieters. La Volonté Machinale: Understand-

ing the Electronic Voting Controversy. Faculty of

Science, Mathematics and Computer Science,

RU. 2008-01

A.L. de Groot. Practical Automaton Proofs

in PVS. Faculty of Science, Mathematics and

Computer Science, RU. 2008-02

M. Bruntink. Renovation of Idiomatic Cross-

cutting Concerns in Embedded Systems. Faculty

of Electrical Engineering, Mathematics, and

Computer Science, TUD. 2008-03

A.M. Marin. An Integrated System to Manage

Crosscutting Concerns in Source Code. Faculty

of Electrical Engineering, Mathematics, and

Computer Science, TUD. 2008-04

N.C.W.M. Braspenning. Model-based Integra-

tion and Testing of High-tech Multi-disciplinary

Systems. Faculty of Mechanical Engineering,

TU/e. 2008-05

275

M. Bravenboer. Exercises in Free Syntax:

Syntax Deﬁnition, Parsing, and Assimilation of

Language Conglomerates. Faculty of Science,

UU. 2008-06

M. Torabi Dashti. Keeping Fairness Alive: De-

sign and Formal Veriﬁcation of Optimistic Fair

Exchange Protocols. Faculty of Sciences, Divi-

sion of Mathematics and Computer Science,

VUA. 2008-07

I.S.M. de Jong. Integration and Test Strategies

for Complex Manufacturing Machines. Faculty

of Mechanical Engineering, TU/e. 2008-08

I. Hasuo. Tracing Anonymity with Coalgebras.

Faculty of Science, Mathematics and Com-

puter Science, RU. 2008-09

L.G.W.A. Cleophas. Tree Algorithms: Two Tax-

onomies and a Toolkit. Faculty of Mathematics

and Computer Science, TU/e. 2008-10

I.S. Zapreev. Model Checking Markov Chains:

Techniques and Tools. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2008-11

M. Farshi. A Theoretical and Experimental Study

of Geometric Networks. Faculty of Mathematics

and Computer Science, TU/e. 2008-12

G. Gulesir. Evolvable Behavior Speciﬁcations Us-

ing Context-Sensitive Wildcards. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2008-13

F.D. Garcia. Formal and Computational Cryptog-

raphy: Protocols, Hashes and Commitments. Fac-

ulty of Science, Mathematics and Computer

Science, RU. 2008-14

P. E. A. Dürr. Resource-based Veriﬁcation for Ro-

bust Composition of Aspects. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2008-15

E.M. Bortnik. Formal Methods in Support of

SMC Design. Faculty of Mechanical Engineer-

ing, TU/e. 2008-16

R.H. Mak. Design and Performance Analysis

of Data-Independent Stream Processing Systems.

Faculty of Mathematics and Computer Sci-

ence, TU/e. 2008-17

M. van der Horst. Scalable Block Processing Al-

gorithms. Faculty of Mathematics and Com-

puter Science, TU/e. 2008-18

C.M. Gray. Algorithms for Fat Objects: Decom-

positions and Applications. Faculty of Mathe-

matics and Computer Science, TU/e. 2008-19

J.R. Calamé. Testing Reactive Systems with Data

- Enumerative Methods and Constraint Solving.

Faculty of Electrical Engineering, Mathemat-

ics & Computer Science, UT. 2008-20

E. Mumford. Drawing Graphs for Cartographic

Applications. Faculty of Mathematics and

Computer Science, TU/e. 2008-21

E.H. de Graaf. Mining Semi-structured Data,

Theoretical and Experimental Aspects of Pattern

Evaluation. Faculty of Mathematics and Natu-

ral Sciences, UL. 2008-22

R. Brijder. Models of Natural Computation:

Gene Assembly and Membrane Systems. Fac-

ulty of Mathematics and Natural Sciences,

UL. 2008-23

A. Koprowski. Termination of Rewriting and

Its Certiﬁcation. Faculty of Mathematics and

Computer Science, TU/e. 2008-24

U. Khadim. Process Algebras for Hybrid Sys-

tems: Comparison and Development. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2008-25

J. Markovski. Real and Stochastic Time in Pro-

cess Algebras for Performance Evaluation. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2008-26

H. Kastenberg. Graph-Based Software Speciﬁca-

tion and Veriﬁcation. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2008-27

I.R. Buhan. Cryptographic Keys from Noisy Data

Theory and Applications. Faculty of Electrical

Engineering, Mathematics & Computer Sci-

ence, UT. 2008-28

R.S. Marin-Perianu. Wireless Sensor Networks

in Motion: Clustering Algorithms for Service Dis-

covery and Provisioning. Faculty of Electrical

Engineering, Mathematics & Computer Sci-

ence, UT. 2008-29

M.H.G. Verhoef. Modeling and Validating Dis-

tributed Embedded Real-Time Control Systems.

Faculty of Science, Mathematics and Com-

puter Science, RU. 2009-01

M. de Mol. Reasoning about Functional Pro-

grams: Sparkle, a proof assistant for Clean. Fac-

ulty of Science, Mathematics and Computer

Science, RU. 2009-02

M. Lormans. Managing Requirements Evolu-

tion. Faculty of Electrical Engineering, Mathe-

matics, and Computer Science, TUD. 2009-03

276

M.P.W.J. van Osch. Automated Model-based

Testing of Hybrid Systems. Faculty of Mathe-

matics and Computer Science, TU/e. 2009-04

H. Sozer. Architecting Fault-Tolerant Soft-

ware Systems. Faculty of Electrical Engi-

neering, Mathematics & Computer Science,

UT. 2009-05

M.J. van Weerdenburg. Efﬁcient Rewriting

Techniques. Faculty of Mathematics and Com-

puter Science, TU/e. 2009-06

H.H. Hansen. Coalgebraic Modelling: Applica-

tions in Automata Theory and Modal Logic. Fac-

ulty of Sciences, Division of Mathematics and

Computer Science, VUA. 2009-07

A. Mesbah. Analysis and Testing of Ajax-based

Single-page Web Applications. Faculty of Electri-

cal Engineering, Mathematics, and Computer

Science, TUD. 2009-08

A.L. Rodriguez Yakushev. Towards Getting Ge-

neric Programming Ready for Prime Time. Fac-

ulty of Science, UU. 2009-9

K.R. Olmos Joffré. Strategies for Context Sensi-

tive Program Transformation. Faculty of Science,

UU. 2009-10

J.A.G.M. van den Berg. Reasoning about Java

programs in PVS using JML. Faculty of Sci-

ence, Mathematics and Computer Science,

RU. 2009-11

M.G. Khatib. MEMS-Based Storage Devices. In-

tegration in Energy-Constrained Mobile Systems.

Faculty of Electrical Engineering, Mathematics

& Computer Science, UT. 2009-12

S.G.M. Cornelissen. Evaluating Dynamic Anal-

ysis Techniques for Program Comprehension. Fac-

ulty of Electrical Engineering, Mathematics,

and Computer Science, TUD. 2009-13

D. Bolzoni. Revisiting Anomaly-based Network

Intrusion Detection Systems. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2009-14

H.L. Jonker. Security Matters: Privacy in Vot-

ing and Fairness in Digital Exchange. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2009-15

M.R. Czenko. TuLiP - Reshaping Trust Manage-

ment. Faculty of Electrical Engineering, Math-

ematics & Computer Science, UT. 2009-16

T. Chen. Clocks, Dice and Processes. Faculty of

Sciences, Division of Mathematics and Com-

puter Science, VUA. 2009-17

C. Kaliszyk. Correctness and Availability: Build-

ing Computer Algebra on top of Proof Assistants

and making Proof Assistants available over the

Web. Faculty of Science, Mathematics and

Computer Science, RU. 2009-18

R.S.S. O’Connor. Incompleteness & Complete-

ness: Formalizing Logic and Analysis in Type

Theory. Faculty of Science, Mathematics and

Computer Science, RU. 2009-19

B. Ploeger. Improved Veriﬁcation Methods for

Concurrent Systems. Faculty of Mathematics

and Computer Science, TU/e. 2009-20

T. Han. Diagnosis, Synthesis and Analysis of

Probabilistic Models. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2009-21

R. Li. Mixed-Integer Evolution Strategies for Pa-

rameter Optimization and Their Applications to

Medical Image Analysis. Faculty of Mathemat-

ics and Natural Sciences, UL. 2009-22

J.H.P. Kwisthout. The Computational Complex-

ity of Probabilistic Networks. Faculty of Science,

UU. 2009-23

T.K. Cocx. Algorithmic Tools for Data-Oriented

Law Enforcement. Faculty of Mathematics and

Natural Sciences, UL. 2009-24

A.I. Baars. Embedded Compilers. Faculty of Sci-

ence, UU. 2009-25

M.A.C. Dekker. Flexible Access Control for Dy-

namic Collaborative Environments. Faculty of

Electrical Engineering, Mathematics & Com-

puter Science, UT. 2009-26

J.F.J. Laros. Metrics and Visualisation for Crime

Analysis and Genomics. Faculty of Mathematics

and Natural Sciences, UL. 2009-27

C.J. Boogerd. Focusing Automatic Code Inspec-

tions. Faculty of Electrical Engineering, Math-

ematics, and Computer Science, TUD. 2010-01

M.R. Neuhäußer. Model Checking Nondeter-

ministic and Randomly Timed Systems. Fac-

ulty of Electrical Engineering, Mathematics &

Computer Science, UT. 2010-02

J. Endrullis . Termination and Productivity. Fac-

ulty of Sciences, Division of Mathematics and

Computer Science, VUA. 2010-03

T. Staijen. Graph-Based Speciﬁcation and Veriﬁ-

cation for Aspect-Oriented Languages. Faculty of

Electrical Engineering, Mathematics & Com-

puter Science, UT. 2010-04

277

Y. Wang. Epistemic Modelling and Protocol Dy-

namics. Faculty of Science, UvA. 2010-05

J.K. Berendsen. Abstraction, Prices and Proba-

bility in Model Checking Timed Automata. Fac-

ulty of Science, Mathematics and Computer

Science, RU. 2010-06

A. Nugroho. The Effects of UML Modeling on

the Quality of Software. Faculty of Mathematics

and Natural Sciences, UL. 2010-07

A. Silva. Kleene Coalgebra. Faculty of Sci-

ence, Mathematics and Computer Science,

RU. 2010-08

J.S. de Bruin. Service-Oriented Discovery of

Knowledge - Foundations, Implementations and

Applications. Faculty of Mathematics and Nat-

ural Sciences, UL. 2010-09

D. Costa. Formal Models for Component Connec-

tors. Faculty of Sciences, Division of Mathe-

matics and Computer Science, VUA. 2010-10

M.M. Jaghoori. Time at Your Service: Schedula-

bility Analysis of Real-Time and Distributed Ser-

vices. Faculty of Mathematics and Natural Sci-

ences, UL. 2010-11

R. Bakhshi. Gossiping Models: Formal Analysis

of Epidemic Protocols. Faculty of Sciences, De-

partment of Computer Science, VUA. 2011-01

B.J. Arnoldus. An Illumination of the Template

Enigma: Software Code Generation with Tem-

plates. Faculty of Mathematics and Computer

Science, TU/e. 2011-02

E. Zambon. Towards Optimal IT Availability

Planning: Methods and Tools. Faculty of Elec-

trical Engineering, Mathematics & Computer

Science, UT. 2011-03

L. Astefanoaei. An Executable Theory of Multi-

Agent Systems Reﬁnement. Faculty of Mathe-

matics and Natural Sciences, UL. 2011-04

J. Proença. Synchronous coordination of dis-

tributed components. Faculty of Mathematics

and Natural Sciences, UL. 2011-05

A. Moralı. IT Architecture-Based Conﬁdential-

ity Risk Assessment in Networks of Organizations.

Faculty of Electrical Engineering, Mathematics

& Computer Science, UT. 2011-06

M. van der Bijl. On changing models in

Model-Based Testing. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2011-07

C. Krause. Reconﬁgurable Component Connec-

tors. Faculty of Mathematics and Natural Sci-

ences, UL. 2011-08

M.E. Andrés. Quantitative Analysis of Informa-

tion Leakage in Probabilistic and Nondeterministic

Systems. Faculty of Science, Mathematics and

Computer Science, RU. 2011-09

M. Atif. Formal Modeling and Veriﬁcation of Dis-

tributed Failure Detectors. Faculty of Mathemat-

ics and Computer Science, TU/e. 2011-10

P.J.A. van Tilburg. From Computability to Exe-

cutability – A process-theoretic view on automata

theory. Faculty of Mathematics and Computer

Science, TU/e. 2011-11

Z. Protic. Conﬁguration management for models:

Generic methods for model comparison and model

co-evolution. Faculty of Mathematics and Com-

puter Science, TU/e. 2011-12

S. Georgievska. Probability and Hiding in Con-

current Processes. Faculty of Mathematics and

Computer Science, TU/e. 2011-13

S. Malakuti. Event Composition Model: Achiev-

ing Naturalness in Runtime Enforcement. Fac-

ulty of Electrical Engineering, Mathematics &

Computer Science, UT. 2011-14

M. Raffelsieper. Cell Libraries and Veriﬁca-

tion. Faculty of Mathematics and Computer

Science, TU/e. 2011-15

C.P. Tsirogiannis. Analysis of Flow and Visibil-

ity on Triangulated Terrains. Faculty of Mathe-

matics and Computer Science, TU/e. 2011-16

Y.-J. Moon. Stochastic Models for Quality

of Service of Component Connectors. Fac-

ulty of Mathematics and Natural Sciences,

UL. 2011-17

R. Middelkoop. Capturing and Exploiting Ab-

stract Views of States in OO Veriﬁcation. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2011-18

M.F. van Amstel. Assessing and Improving

the Quality of Model Transformations. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2011-19

A.N. Tamalet. Towards Correct Programs in

Practice. Faculty of Science, Mathematics and

Computer Science, RU. 2011-20

H.J.S. Basten. Ambiguity Detection for Program-

ming Language Grammars. Faculty of S cience,

UvA. 2011-21

278

M. Izadi. Model Checking of Component Con-

nectors. Faculty of Mathematics and Natural

Sciences, UL. 2011-22

L.C.L. Kats. Building Blocks for Language

Workbenches. Faculty of Electrical Engineer-

ing, Mathematics, and Computer Science,

TUD. 2011-23

S. Kemper. Modelling and Analysis of Real-Time

Coordination Patterns. Faculty of Mathematics

and Natural Sciences, UL. 2011-24

J. Wang. Spiking Neural P Systems. Fac-

ulty of Mathematics and Natural Sciences,

UL. 2011-25

A. Khosravi. Optimal Geometric Data Struc-

tures. Faculty of Mathematics and Computer

Science, TU/e. 2012-01

A. Middelkoop. Inference of Program Properties

with Attribute Grammars, Revisited. Faculty of

Science, UU. 2012-02

Z. Hemel. Methods and Techniques for the

Design and Implementation of Domain-Speciﬁc

Languages. Faculty of Electrical Engineer-

ing, Mathematics, and Computer Science,

TUD. 2012-03

T. Dimkov. Alignment of Organizational Secu-

rity Policies: Theory and Practice. Faculty of

Electrical Engineering, Mathematics & Com-

puter Science, UT. 2012-04

S. Sedghi. Towards Provably Secure Efﬁciently

Searchable Encryption. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2012-05

F. Heidar ian Dehkordi. Studies on Veriﬁca-

tion of Wireless Sensor Networks and Abstrac-

tion Learning for System Inference. Faculty of

Science, Mathematics and Computer Science,

RU. 2012-06

K. Verbeek. Algorithms for Cartographic Visual-

ization. Faculty of Mathematics and Computer

Science, TU/e. 2012-07

D.E. Nadales Agut. A Compositional Inter-

change Format for Hybrid Systems: Design and

Implementation. Faculty of Mechanical Engi-

neering, TU/e. 2012-08

H. Rahmani. Analysis of Protein-Protein Interac-

tion Networks by Means of Annotated Graph Min-

ing Algorithms. Faculty of Mathematics and

Natural Sciences, UL. 2012-09

S.D. Vermolen. Software Language Evolution.

Faculty of Electrical Engineering, Mathemat-

ics, and Computer Science, TUD. 2012-10

L.J.P. Engelen. From Napkin Sketches to Reliable

Software. Faculty of Mathematics and Com-

puter Science, TU/e. 2012-11

F.P.M. Stappers. Bridging Formal Models – An

Engineering Perspective. Faculty of Mathemat-

ics and Computer Science, TU/e. 2012-12

W. Heijstek. Software Architecture Design in

Global and Model-Centric Software Development.

Faculty of Mathematics and Natural Sciences,

UL. 2012-13

C. Kop. Higher Order Termination. Faculty

of Sciences, Department of Computer Science,

VUA. 2012-14

A. Osaiweran. Formal Development of Control

Software in the Medical Systems Domain. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2012-15

W. Kuijper. Compositional Synthesis of Safety

Controllers. Faculty of Electrical Engi-

neering, Mathematics & Computer Science,

UT. 2012-16

H. Beohar. Reﬁnement of Communication and

States in Models of Embedded Systems. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2013-01

G. Igna. Performance Analysis of Real-Time

Task Systems using Timed Automata. Faculty of

Science, Mathematics and Computer Science,

RU. 2013-02

E. Zambon. Abstract Graph Transformation –

Theory and Practice. Faculty of Electrical En-

gineering, Mathematics & Computer Science,

UT. 2013-03

B. Lijnse. TOP to the Rescue – Task-Oriented

Programming for Incident Response Applications.

Faculty of Science, Mathematics and Com-

puter Science, RU. 2013-04

G.T. de Koning Gans. Outsmarting Smart

Cards. Faculty of Science, Mathematics and

Computer Science, RU. 2013-05

M.S. Greiler. Test Suite Comprehension for Mod-

ular and Dynamic Systems. Faculty of Electrical

Engineering, Mathematics, and Computer Sci-

ence, TUD. 2013-06

L.E. Mamane. Interactive mathematical docu-

ments: creation and presentation. Faculty of

279

Science, Mathematics and Computer Science,

RU. 2013-07

M.M.H.P. van den Heuvel. Composition and

synchronization of real-time components upon one

processor. Faculty of Mathematics and Com-

puter Science, TU/e. 2013-08

J. Businge. Co-evolution of the Eclipse Frame-

work and its Third-party Plug-ins. Fac-

ulty of Mathematics and Computer S cience,

TU/e. 2013-09

S. van der Burg. A Reference Architecture

for Distributed Software Deployment. Faculty

of Electrical Engineering, Mathematics, and

Computer Science, TUD. 2013-10

280