Galeria Kaufhof Technology Blog

Video-Mitschnitt: Vortrag zu ‘Modeling Domain Objects: Best Practices’

2017-07-27T00:00:00+00:00

Am 12. Juli 2017 waren wir erneut Gastgeber des monatlichen Scala Meetups in Köln. Vortragender war der Kaufhof eShop Kollege Valentin Willscher zum Thema "Modeling Domain Objects: Best Practices".

Die Folien zum Vortrag sind nun abrufbar, und es gibt auch einen Videomitschnitt des Vortrags:

Video-Mitschnitt: Vortrag zur eShop Architektur auf der OOP 2017

2017-03-09T00:00:00+00:00

Am 31. Januar 2017 hatten wir die Möglichkeit, den Architekturansatz des Galeria Kaufhof Online Shops auf der OOP 2017 im Rahmen eines Vortrags ausführlich zu erläutern:

Scala Play2: Tolerant JSON body parsing with dedicated error handling

2016-05-16T00:00:00+00:00

I'm currently rewriting a Scala Play2 based web service that employs the following body parser:

def tolerantJsonParser[A](implicit reader: Reads[A]): BodyParser[A] =
  parse.tolerantJson.validate(json =>
    json.validate[A].asEither.left.map(err => Results.BadRequest)
  )(play.api.libs.iteratee.Execution.Implicits.trampoline)

def doSomething = Action.async(tolerantJsonParser[SomeThing]) { request =>
  val someThing: SomeThing = request.body
  ...

Not shown here is the implicit Reads that takes care of transforming the Json object into an object of case class SomeThing when the json.validate method is called.

As you can see, the existing code took care of handling the incoming Json in a tolerant manner - that is, it didn't bail out if the media type of the body is not application/json, which is what Play does per default, but which is not what we want in this case because clients send requests with a more specific media type to this webservice.

In case that the Json object to case class transformation fails, the body parser correctly answers the request with a 400 Bad Request response status code. This covers all cases where the body is valid Json, but cannot be mapped to the structure of the case class using the implicit Reads.

Another case is implicitly covered, too - if the request body isn't even Json to begin with (e.g. because a { is missing, as in "foo":"bar"}), then parse.tolerantJson fails, resulting in a failure response, too.

However, the latter case is handled by Play2, resulting in a generic HTML error response - but for the rewrite, I wanted to have dedicated error handling because my goal was to send a specific Json encoded error response.

The solution turned out to be quite simple (which didn't stop me from taking several hours to come up with it) - by parsing the Json myself using the parse.tolerantText body parser, I gained full control over the body parsing process, which allowed me to react to errors in both steps in the process - the text-to-json transformation as well as the json-to-case-class-object transformation:

def tolerantTryJsonParser[A](implicit reader: Reads[A]): BodyParser[Try[A]] = {
  parse.tolerantText.map { text =>
    Try {
      Json.parse(text).validate.get
    }
  }
}

def doSomething = Action.async(tolerantTryJsonParser[SomeThing]) { request =>
  request.body match {
    case Success(something) => ...
    case Failure(error) => ...
  }

How Cassandra’s inner workings relate to performance

2016-02-29T00:00:00+00:00

About

At Galeria.de, we learned the hard way that it’s critical to understand the inner workings of the distributed masterless database Cassandra if one wants to experience good performance during reads and writes. This post describes some of the details of how Cassandra works under the hood, and shows how understanding these details helps to anticipate which use patterns work well and which don’t.

Network and node storage architecture

Roughly speaking, there are two main areas within the Cassandra architecture that play a deciding role with regards to query performance - the network of nodes that form the database cluster, and the local storage on each of those nodes.

Efficient queries must be efficient network-wise as well as storage I/O wise. Let’s dig into both areas and see how things work under the hood. If we understand the inner workings of both, we should be prepared to anticipate why certain table-structure/query combinations are efficient and some are not.

The network

A production Cassandra setup always consists of multiple nodes, where a node is one Cassandra server process on one system. All nodes are connected via the network. There isn’t any kind of “master” node - all nodes are created equal.

Logically, the data in a cluster is organized into keyspaces, which contain tables. Tables contain rows, and rows have columns.

Physically, the content of a table row is always stored on the hard drive of at least one node in the cluster, and, depending on how the keyspace has been defined upon creation, this row content is replicated to 0 or more other nodes in the cluster. If all of this doesn’t make sense now, it will once you’ve read this post.

In this post, we always assume that our setup is a cluster of 5 nodes, numbered 1 to 5. A 5 node Cassandra cluster is visualized as follows:

Note that this is a logical visualization, not a network diagram - in reality each node talks to all other nodes in the cluster, not only to its neighbours.

We further assume that we have created a keyspace “galeria” (which is going to hold the data tables we are going to create) as follows:

CREATE KEYSPACE galeria WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 3 };

The replication factor defines that whatever row we insert into a table within this keyspace is stored on three different nodes in the cluster.

We can now create a table “users” within this keyspace like this:

USE galeria;

CREATE TABLE users (
    username TEXT,
    firstname TEXT,
    lastname TEXT,
    PRIMARY KEY (username)
);

When we insert a row into this table as follows:

INSERT INTO users (username, firstname, lastname) VALUES ('jdoe', 'John', 'Doe');

then the following happens network-wise:

Step 1: Client-Coordinator connection

Our client (i.e., the process which issues the CQL statement) connects to a so-called coordinator node. This is not a special node within our cluster - any node that happens to receive a query from a client also happens to act as the coordinator for this query.

Step 2: Mapping the partition key value to a cluster node

The first thing the coordinator node needs to do upon receiving the insert query is to find out where in the cluster the row data for this INSERT needs to be persisted. Because username is the first (and in this case only) part of the primary key, it acts as the partition key. The value of the partition key column is what’s used by the coordinator to determine the first node onto which to store the row. To do so, a hash function - the so-called partitioner - is applied on the value, and the result is a token. This token then tells the cluster about the target node, because each node is responsible for a certain range of tokens. Assumed that tokens would run from 0 to 49 (in reality, the token range is much larger), we can visualize the tokens-to-nodes mapping as follows:

That is, node 3 holds those rows of table “users” in keyspace “galeria” for which the value of column “username” results in a hash function token from 20 to 29.

For example, let’s just make up that the username column value “jdoe” would result in token value 17. This means that the cluster must store the according row at least on node 2.

Step 3: Determine replication nodes

“At least” because what also comes into play is the replication factor of the keyspace holding the user table (which contains the row in question). In our example, this factor is 3, which means that the row we create via the INSERT query needs to be stored on two more nodes, besides its “main” node, 2. The algorithm for this is simple - additional replicas of the row are stored on the next nodes clockwise in the ring - in our example, nodes 3 and 4.

Note that the replication of the write in question happens on all three nodes (2, 3, and 4) simultaneously, and not one-after-the-other. This detail is important because it explains why Cassandra is relatively optimistic regarding the to-disk-sync of a node’s commit log (see chapter “The node storage” for more on this).

As said, the replica order is based on the logical structure of the cluster - the cluster sees itself as an ordered ring structure, where each node has a “neighbour” node that comes “after” it in the ring.

Step 4: Wait for node write operations to finish and report back to the client

Once enough nodes have reported that their local write operations succeeded (see chapter “The node storage” for the details), the coordinator node in turn reports the success of the INSERT operation back to the client. Here, “enough” nodes depends on the replication factor <-> query consistency level relation for this operation. If we insert a row into a table that belongs to a keyspace with a replication factor of 3, and the query was issued with a consisteny level of QUORUM, then 2 nodes (the quorum of 3 nodes) acknowledging the write is considered a success by the coordinator.

The node storage

Let’s “zoom in” on our node 2 and have a look at what happens in terms of its local storage when it receives the write request from the coordinator node as a result of the INSERT query issued by the client. As noted, the same operations happen on nodes 3 and 4 simultaneously.

While the final goal is to have the data stored in a so-called SSTable on disk, several moving parts are involved in persisting the row data on a node:

Why are there three different storage mechanisms combined in order to persist data and make data retrievable? The reason is that only the interplay of these three mechanisms gives us a database that will, if used correctly, allow for efficient data writes, efficient data reads, as well as durable storage of large amounts of data - which is great because we certainly want a database that handles our INSERTs quickly, answers our SELECTs fast, doesn’t loose any of our data while doing so, and stores more data than what fits into expensive and therefore limited memory.

So, looks like we have four important qualities that we want to have covered on the storage level: fast writes, fast reads, plenty of capacity, durable storage.

Each one of the three mechanisms (commit log, MemTable, SSTables) covers at most three of these four qualities:

If we would only care about storing a lot of data in a durable way, the commit log would do - writing into it is fast, and it is stored on the harddisk (but finding and reading all data for a desired row is very slow).

If we would only care about read and write performance, the MemTable would do - writing to and reading from it is fast because all data of the MemTable lives in memory (but memory size is limited, and the data is not stored in a durable way).

If we would only care about fast reads and durable storage of lots of data, then the SSTables would do, because these are stored on disk in a structure that allows to quickly locate a desired data element (but writing data into this structure is slow).

If we want to cover all four qualities, all three mechanisms need to be combined. Let’s see how this works in practice.

The commit log

The first step taken storage-wise is to write the row data of our INSERT into the commit log.

The commit log is part of the process because it ensures data durability even in crash scenarios. Even if the server crashes during a write operation - if the data made it into the commit log, our data is safe and it can be recovered when the node is coming up again.

Note that, as mentioned above, Cassandra is quite optimistic in regards to actually syncing the commit log writes to disk - per default, this happens every 10 seconds, but the node immediately acknowledges the write to the coordinator (after also writing to the MemTable, see below), without waiting for the fsync. This means that there is a window of up to 10 seconds during which, in case of a server crash, the data is not persisted on the harddrive of the crashing server node, although the coordinator will think it is.

What in theory sounds highly problematic in terms of data durability isn’t a big deal in practice. Cassandra assumes that data is always replicated, and two participating server nodes crashing within the same 10 seconds window is very unlikely.

The MemTable

Next in line is the so-called MemTable. Why does the MemTable exist? When receiving a read request, Cassandra cannot retrieve the requested data efficiently from the commit log - to allow for fast writes, it is append-only, which means it contains data in the order that write requests arrived, which in turn means that in order to retrieve all data for a row would mean to sequentially scan through the whole commit log from top to bottom, which would be prohibitively expensive in terms of disk I/O.

The layout of SSTables, on the other hand, is optimized for efficient lookup of disk-stored row data. However, Cassandra cannot update the SSTable holding the data for a given row synchronously upon each write, because this would result in a huge amount of random disk I/O operations, making the write scenario prohibitively expensive in terms of disk I/O. To circumvent this, SSTables are never updated - instead, they are created only from time to time, and are only written to once, and are then immutable (read-only) - new, additional SSTables are created to cover new row data or updates to existing row data.

Let’s close the circle: If row data cannot be retrieved from the commit log efficiently, and data isn’t put into SSTables immediately, then another data structure is required in order to answer read requests immediately (as soon as the data is written to the node) and efficiently.

And thus, MemTables come into play. Each server node has one MemTable for each table it carries. A MemTable lives, as the name implies, in memory, and is mutable, i.e., row data is read from it and written to it as needed, which thanks to the I/O performance of computer memory, is not prohibitively expensive - a fact that is nicely illustrated by one of my all time favorite tables, where typical computer-world timings are compared to typical timings that humans can relate to:

1 CPU cycle                      0.3 ns     1 s
Level 1 cache access             0.9 ns     3 s
Level 2 cache access             2.8 ns     9 s
Level 3 cache access            12.9 ns    43 s
Main memory access               120 ns     6 m
Solid-state disk I/O          50-150 μs   2-6 days
Rotational disk I/O             1-10 ms  1-12 months
Internet: SF to NYC               40 ms     4 years
Internet: SF to UK                81 ms     8 years
Internet: SF to Australia        183 ms    19 years
OS virtualization reboot           4 s    423 years
SCSI command time-out             30 s   3000 years
Hardware virtualization reboot    40 s   4000 years
Physical system reboot             5 m     32 millenia

Thus, reading and writing data from and to main memory versus from and to disk is like waiting for your salad order to be finished in 6 minutes versus getting the salad somewhere between the day after tomorrow and next year. Sounds like a MemTable does the job.

And thus, right after appending the INSERT data to the commit log, the node puts the same row data into the MemTable structure. At this point, the data is both durable (commit log) and efficiently retrieveable (MemTable), and thus, the data node can acknowledge the write to the coordinator node: Thanks, I have the data, and I’m able to provide it quickly if anyone asks.

The SSTables

As already mentioned, this situation is fine for the moment, but without the third mechanism - SSTables - we’d quickly run into problems once the node has to hold more data than the size of its memory allows.

SSTables certainly are the most interesting data structure of the three. A new SSTable is created whenever the MemTable reaches a certain size (at which point it is considered “full”).

As said, SSTables are immutable, and this results in a certain fragmentation of row data.

Let’s assume we would issue the following three write statements, spread over a longer period of time:

INSERT INTO users (username, firstname, lastname) VALUES ('jdoe', '', '');

UPDATE users SET firstname = 'John' WHERE username = 'jdoe';

UPDATE users SET lastname = 'Doe' WHERE username = 'jdoe';

UPDATE users SET firstname = 'John B.' WHERE username = 'jdoe';

Let’s further assume that between each of these operations, a lot of other CQL operations took place, and thus, between these three operations, the MemTable of the target node became full several times and has been flushed into new SSTables. Our row data is now distributed as follows:

Memtable: Knows nothing about the row anymore because it has been flushed

SSTable 1: Row data has been stored under partition key 'jdoe', with firstname = '' and lastname = ''

SSTable 2: Row data has been stored under partition key 'jdoe', with firstname = 'John'

SSTable 3: Row data has been stored under partition key 'jdoe', with lastname = 'Doe'

SSTable 4: Row data has been stored under partition key 'jdoe', with firstname = 'John B.'

Now imagine we would like to retrieve the full row data via SELECT firstname, lastname FROM users WHERE username = 'jdoe'. It’s not enough to look into the newest SSTable, because it only knows about the latest data change for the row. Cassandra has to go through all SSTables, and must put together the full set of latest row data, while also resolving multiple updates to the same column using the timestamp of the write event: In our case, the correct firstname value is John B. in SSTable 4, making the value stored in SSTable 2 irrelevant.

As said, the structure of an SSTable is optimized for efficient reads from disk - entries in one SSTable are sorted by partition key, and an index of all partition keys is loaded into memory when an SSTable is openend. Looking up a row therefore is only one disk seek, with further sequential reads for retrieving the actual row data - thus, no expensive random I/O needs to be performed.

However, if we run a Cassandra cluster over a long period of time, we get more and more SSTables. And because collecting the actual data for a requested row means searching through more and more SSTables, row reads become less efficient over time.

In order to avoid this kind of read performance deterioration, Cassandra runs a regular optimization process called compaction. This process takes multiple SSTables files and combines them into one new SSTable file - the result could, for example, look like this:

SSTable 1 (previously 1 & 2): Row data stored under partition key 'jdoe', with firstname = 'John' and lastname = ''

SSTable 2 (previously 3 & 4): Row stored under partition key 'jdoe', with firstname = 'John B.' and lastname = 'Doe'

After compaction, less files need to be searched in order to gather row data. (And there are other measures employed by Cassandra in order to further reduce disk operations - for example, a bloom filter is used to determine SSTables that can be skipped when looking up data).

Performance of Cassandra operations

Musings about the performance of a Cassandra operation boil down to two questions: How much work will the coordinator node have to do in terms of network I/O, and how much work will a participating data node have to do in terms of local disk I/O?

The fastest operations are those for which the coordinator has to talk to only one data node, and where the approached data node can find the requested information by searching as little as possible through as little SSTables as possible. An expensive operation is one where the coordinator node has to talk to all nodes on the cluster, and where each of the nodes has to scan a lot through many SSTables to retrieve the requested information.

However, if we do need to retrieve a lot of data, we actually want to shoulder the burden of bringing this data up onto multiple nodes, in order to light the burden each node shoulder - that is, after all, one of the main reasons for choosing a database that is distributed and therefore horizontally scalable.

That’s why so-called hotspots can be a problem. Imagine if our primary key is a column holding the day of week, as follows:

CREATE TABLE logins (
 day_of_week TEXT,
 username TEXT,
 PRIMARY KEY (day_of_week, username)
);

Every time a user logs into the system, we write their username into this table, with day_of_week set to "monday", "tuesday", etc.

Because the partition key decides about the node that has to store data for a particular row, each and every write to this table that happens on a Monday will have to be handled by the same node. Thus, the whole I/O burden for user login logging will be on one particular node for the whole day (and on another node on another day). Even if your Cassandra cluster is 1,000 nodes big - if the I/O throughput limit of the “Monday node” is exhausted on a given Monday, you will run into problems when trying to log yet another login event for this day.

If we want to retrieve the list of all usernames that logged in on Monday, this is inefficient, too. The query

SELECT username FROM logins WHERE day_of_week = 'monday';

will result in approaching the one node that maps to the “monday” partition key value, and the I/O burden of reading through all SSTable entries for this day of week is on this one node, while all other nodes stay idle.

The need for balancing table structures and query strategies between the two, often conflicting, goals of spreading data evenly around the cluster while also minimizing the number of partitions that have to be read is perfectly explained in Basic Rules of Cassandra Data Modeling, the first and most important document to read if one wants to use Cassandra.

While our post provides a glimpse under the hood, the Data Modeling post approaches its recommendations from an “outside” perspective, and teaches the what and how of data modeling and querying. If you mix in the “inside” view from our post, you should be well equipped to anticipate performance behaviours of your cluster. Look at a table structure and the queries operating on it. Then think about what happens on the network, and what happens on the disk of a node during query execution. This should set you on the right track for most cases.

As a final example for this, consider the following keyspace and table:

CREATE KEYSPACE galeria WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 1 };

CREATE TABLE products (
    name TEXT,
    color TEXT,
    PRIMARY KEY (name, color)
);

We fill this as follows:

INSERT INTO products (name, color) VALUES ('Towel', 'red');
INSERT INTO products (name, color) VALUES ('Towel', 'blue');

INSERT INTO products (name, color) VALUES ('Shirt', 'yellow');
INSERT INTO products (name, color) VALUES ('Shirt', 'red');

INSERT INTO products (name, color) VALUES ('Jacket', 'red');
INSERT INTO products (name, color) VALUES ('Jacket', 'blue');

Now let’s compare these two queries:

SELECT * FROM products WHERE name IN ('Towel', 'Jacket') AND color = 'red';

SELECT * FROM products LIMIT 1;

Which one is more efficient?

The first looks more complex - we are asking for two different primary keys, plus a clustering key value. The resultset is two rows, the red towel and the red jacket.

The second one looks simple: All we need is one row.

But in fact, the second query is about 10x as complex as the first one when it comes to query execution. We can even visualize this. Here is a screenshot showing the output of the statement trace for both queries (you get the tracing output for all following queries by issueing TRACING ON on the CQL shell). The statement trace lists all network and disk operations that need to be run in order to satisfy the query. I’ve taken screenshots of both text outputs, pasted them next to each other, and rotated the image by 90 degree. The output for the first query is on top, the output for the second is beneath it.

Although no detail information is visible at this scale, we can clearly see how the second query required way more network and disk operations compared to the first. Why is this?

We have a keyspace with a replication factor of 1, that is, the row data for one partition key is stored on one node in the cluster. I’ve run both queries on a 3-node cluster. The first query specifically asks for two partition key values - thanks to the hash algorithm, the coordinator node can calculate which nodes it needs to connect to, and on those nodes, the index of the SSTables leads to the row data without unneccessary disk seek overhead.

The second query is much harder to satisfy: Because no partition key value is provided, the coordinator can not know which nodes have rows for the table. It has to ask every single node. On each node, again because there is no partition key value, each SSTable must be sequentially scanned for possible row data.

And this is only on a 3-node cluster. Imagine running the queries on a 1000-node cluster - not much changes for the first query, because asking for two partition key values still means visiting at most two nodes (or only one, if both values happen to resolve to the same node). But for the second query, the situation is even worse: now 1000 nodes need to be visited for potential row data - even if only one single row is eventually returned.

With our understanding of the network and storage mechanisms employed by Cassandra, this kind of behaviour can be anticipated, which helps to avoid unhealthy table structures and query strategies.

Compile Time Cassandra Injection in Play 2.4

2016-01-17T00:00:00+00:00

About

Play 2.4 supports Compile Time Dependency Injection. This post describes how to inject your own Cassandra repository object into a controller at compile time, while also initializing and closing a Cassandra connection session during application startup and shutdown, respectively.

The code of the final application is available at https://github.com/Galeria-Kaufhof/play2-compiletime-cassandra-di.

The goal

At the end of this post, we have created a small Play 2.4.6 Scala application with which will be able to serve the name of a product with a given id by reading information from a Cassandra database. We will be able to verify the correct behaviour of our application using a real database with an integration test and, because we will be able to mock the repository that is injected into the controller, we will also be able to verify the correct application behaviour without using a database.

The resulting code will be runnable and realistic, but also a bit simplistic - lacking error handling, for example - in order to be tractable.

Prerequisites

This post is aimed at readers who have already written Scala applications with Play2 and know how to work with sbt and Cassandra.

In order to compile and run the code, you need a Java 8 SE Development Kit, and you need a recent version of sbt.

Last but not least, you need to set up a Cassandra cluster - a one-node local setup is sufficient for the application that we'll create.

The post is written from the perspective of a Mac OS X system user with Homebrew installed, but should be adaptable for any Scala-capable environment with minor modifications.

Project setup

Let's start by creating an sbt-based Play 2.4 project using the Typesafe Activator, which we install via Homebrew: brew install typesafe-activator.

We can then use Activator to set up the Play2 project: activator new play2-compiletime-cassandra-di play-scala.

The first thing to do now is to switch from specs2 to ScalaTest as our testing framework, as described in Play2: Switching from specs2 to ScalaTest. Please change files build.sbt, test/ApplicationSpec.scala, and test/IntegrationSpec.scala as described there.

Running sbt test afterwards should just work. At this point, your codebase should look like the reference repository at 3a96b61.

Introducing the Cassandra driver

We are now going to integrate the Datastax Java Driver for Apache Cassandra, roughly following the steps outlined in Setting up a Scala sbt multi-project with Cassandra connectivity and migrations (but without tests and the migrations stuff to keep the codebase small for this post).

This means adding the Cassandra driver as a dependency to file build.sbt, creating a utility class for connection URIs in file app/cassandra/CassandraConnectionUri.scala, and adding an object that handles database connections in file app/cassandra/CassandraConnector.scala. See the resulting codebase on GitHub at 0fa30e4 or view the differences from the previous version of the codebase.

With this, we can finally start to work on the actual Cassandra repository and learn how to inject it into a controller.

Again, I'm trying to keep things simple and the codecase lean. We will inject the repository into the existing Application controller in file app/controllers/Application.scala.

Our repository has only one job: It allows controllers to retrieve exactly one row from the Cassandra table it manages using a single-column primary key. Imagine we have a Cassandra table called products, with the following structure:

+----+-------+
| id | name  |
+----+-------+
|  1 | Chair |
|  2 | Fork  |
|  3 | Lamp  |
+----+-------+

We can create the according table structure (and its keyspace) as follows:

CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE test;
CREATE TABLE products (id INT PRIMARY KEY, name TEXT);

Let's make some assumptions: We expect our repository to provide a method getOneById(id: Int): ProductModel. Given a product id, the repository takes care of the heavy lifting that is required to return a ProductModel object which carries the data for this product as retrieved from the database.

We can start by declaring the model. Its a simple case class that lives in app/models/ProductModel.scala:

package models

case class ProductModel(id: Int, name: String)

Next, we start building a repository structure which keeps the API towards clients of repositories abstract, and allows to create concrete implementations that work with a Cassandra database.

The first step is to define a trait that all repositories, be it concrete Cassandra-based implementations or lightweight mocks in test, will share. To do so, we put the following in file app/repositories/Repository.scala:

package repositories

abstract trait Repository[M, I] {
  def getOneById(id: I): M
}

Simple enough. This ensures that repository implementations will provide a getOneById method, and type parametrization allows to declare the type of parameter id and the type of the model object that has to be returned.

We know that we have to query the repository for integer values, and we know that we want to retrieve a ProductModel in return. Thus, we can already declare the fact that our Application controller depends on such a repository, in file app/controllers/Application.scala:

package controllers

import models.ProductModel
import play.api._
import play.api.mvc._
import repositories.Repository

class Application(productsRepository: Repository[ProductModel, Int]) extends Controller {

  def index = Action {
    Ok(views.html.index("Your new application is ready."))
  }

}

At this point (commit d58d681 in the repo, diff), the application can no longer run and the existing test cases fail, because we have not yet implemented the mechanisms needed to actually inject the declared dependency into the controller. We will fix this later - first, we write a generic Cassandra repository implementation and use it to create a concrete implementation for the Repository[ProductModel, Int] type.

To do so, we create an abstract CassandraRepository class that does the heavy lifting, in file app/repositories/CassandraRepository.scala:

package repositories

import com.datastax.driver.core.querybuilder.QueryBuilder
import com.datastax.driver.core.querybuilder.QueryBuilder._
import com.datastax.driver.core.{Row, Session}

abstract class CassandraRepository[M, I](session: Session, tablename: String, partitionKeyName: String)
  extends Repository[M, I] {
  def rowToModel(row: Row): M

  def getOneRowBySinglePartitionKeyId(partitionKeyValue: I): Row = {
    val selectStmt =
      select()
        .from(tablename)
        .where(QueryBuilder.eq(partitionKeyName, partitionKeyValue))
        .limit(1)

    val resultSet = session.execute(selectStmt)
    val row = resultSet.one()
    row
  }

  override def getOneById(id: I): M = {
    val row = getOneRowBySinglePartitionKeyId(id)
    rowToModel(row)
  }
}

Some notes on this class: in addition to the type parameters for the model and id field, a concrete class that extends this abstract CassandraRepository needs to provide a session representing the connection to a cassandra cluster, the name of the table that is to be wrapped, and the name of the field that is to be queried via getOneById. Furthermore, implementations must override the rowToModel method, because only a concrete implementation knows how to create a valid model from the values of a table row.
Finally, getOneRowBySinglePartitionKeyId is where stuff happens: using the session and the means provided by the DataStax driver, the database is queried and the resulting row is returned. Because CassandraRepository extends the Repository trait, it must override getOneById - in this case, that's simply a matter of retrieving the row using the code from the abstract class itself, and transforming it into a model using the to-be-overridden rowToModel method.

With this in place, the concrete implementation of a Cassandra-backed ProductsRepository class that matches the Repository[ProductModel, Int] type looks like this, in file app/repositories/ProductsRepository.scala:

package repositories

import com.datastax.driver.core.{Row, Session}
import models.ProductModel

class ProductsRepository(session: Session)
  extends CassandraRepository[ProductModel, Int](session, "products", "id") {
  override def rowToModel(row: Row): ProductModel = {
    ProductModel(
      row.getInt("id"),
      row.getString("name")
    )
  }
}

With these changes in place (commit cb7a9b4 in the repo, diff), we can approach the injection. How can we control the way the Application controller is created, i.e. how can we control compile time dependency injection? In Play 2.4, this happens by providing a class that extends the play.api.ApplicationLoader trait.

To do so, create file app/AppLoader.scala with the following content:

import components.CassandraRepositoryComponents
import play.api.ApplicationLoader.Context
import play.api.routing.Router
import play.api.{Application, ApplicationLoader, BuiltInComponentsFromContext}
import router.Routes

class AppLoader extends ApplicationLoader {
  override def load(context: ApplicationLoader.Context): Application =
    new AppComponents(context).application
}

class AppComponents(context: Context) extends BuiltInComponentsFromContext(context) with CassandraRepositoryComponents {

  lazy val applicationController = new controllers.Application(productsRepository)
  lazy val assets = new controllers.Assets(httpErrorHandler)

  override def router: Router = new Routes(
    httpErrorHandler,
    applicationController,
    assets
  )
}

Play2 is not able to find out about this AppLoader itself, which is why we need to configure it in file conf/application.conf by adding the line play.application.loader="AppLoader".

As you can see, we are overriding the part of Play2 that creates a runnable play.api.Application by instantiating the Application controller ourselves (while injecting the products repository, more on this later), and by overriding router creation. In order to get access to a products repository instance, the AppComponents class extends the CassandraRepositoryComponents trait. Within this trait, we connect to the database, set up the repository, and hook into the application lifecycle in order to shut down the database connection when the application shuts down. The code for all this goes into app/components/CassandraRepositoryComponents.scala:

package components

import cassandra.{CassandraConnector, CassandraConnectionUri}
import com.datastax.driver.core.Session
import models.ProductModel
import play.api.inject.ApplicationLifecycle
import play.api.{Configuration, Environment, Mode}
import repositories.{Repository, ProductsRepository}
import scala.concurrent.Future

trait CassandraRepositoryComponents {
  // These will be filled by Play's built-in components; should be `def` to avoid initialization problems
  def environment: Environment
  def configuration: Configuration
  def applicationLifecycle: ApplicationLifecycle

  lazy private val cassandraSession: Session = {
    val uriString = environment.mode match {
      case Mode.Test => "cassandra://localhost:9042/test"
      case _         => "cassandra://localhost:9042/prod"
    }
    val session: Session = CassandraConnector.createSessionAndInitKeyspace(
      CassandraConnectionUri(uriString)
    )
    // Shutdown the client when the app is stopped or reloaded
    applicationLifecycle.addStopHook(() => Future.successful(session.close()))
    session
  }

  lazy val productsRepository: Repository[ProductModel, Int] = {
    new ProductsRepository(cassandraSession)
  }
}

As you can see, this approach also allows to adapt to the application environment: In our case, we connect to a different Cassandra cluster URI if we are running in the test environment.

At this point (commit 4e707cb, diff, dependency injection is in place and the application is runnable again.

What still doesn't run, however, are the tests. Also, it's a bit sad that we did all those things that give our controller a repository, and then it doesn't even use it. Let's fix both issues.

We start with the integration spec in file test/IntegrationSpec.scala. We are going to extend this quite a bit:

import java.io.File
import cassandra.{CassandraConnector, CassandraConnectionUri}
import org.scalatest.BeforeAndAfter
import org.scalatestplus.play._
import play.api
import play.api.{Mode, Environment, ApplicationLoader}

class IntegrationSpec extends PlaySpec with OneBrowserPerSuite with OneServerPerSuite with HtmlUnitFactory with BeforeAndAfter {

  before {
    val uri = CassandraConnectionUri("cassandra://localhost:9042/test")
    val session = CassandraConnector.createSessionAndInitKeyspace(uri)
    val query = "INSERT INTO products (id, name) VALUES (1,  'Chair');"
    session.execute(query)
    session.close()
  }

  override implicit lazy val app: api.Application =
    new AppLoader().load(
      ApplicationLoader.createContext(
        new Environment(
          new File("."), ApplicationLoader.getClass.getClassLoader, Mode.Test)
      )
    )

  "Application" should {

    "work from within a browser and tell us about the first product" in {

      go to "http://localhost:" + port

      pageSource must include ("Your new application is ready. The name of product #1 is Chair.")
    }
  }
}

We need to override the implicit app value, where we ask our new AppLoader to create an application for the Test environment. Furthermore, we extend our specification and now expect our app to not only greet us, but to also tell us about the name of the product with ID 1, which we insert in the new before step of our specification.

We could do the same with the ApplicationSpec, but let's go one step further and mock the ProductRepository, which gives us a specification that in contrast to the integration spec doesn't need an actual Cassandra database to work, and only verifies the behaviour of the Application controller itself, not its dependencies. To do so, we change file test/ApplicationSpec.scala as follows:

import java.io.File
import models.ProductModel
import play.api
import play.api.{Mode, Environment, ApplicationLoader}
import play.api.ApplicationLoader.Context
import play.api.test._
import play.api.test.Helpers._
import org.scalatestplus.play._
import repositories.Repository

class MockProductsRepository extends Repository[ProductModel, Int] {
  override def getOneById(id: Int): ProductModel = {
    ProductModel(1, "Mocked Chair")
  }
}

class FakeApplicationComponents(context: Context) extends AppComponents(context) {
  override lazy val productsRepository = new MockProductsRepository()
}

class FakeAppLoader extends ApplicationLoader {
  override def load(context: Context): api.Application =
    new FakeApplicationComponents(context).application
}

class ApplicationSpec extends PlaySpec with OneAppPerSuite {

  override implicit lazy val app: api.Application = {
    val appLoader = new FakeAppLoader
    appLoader.load(
      ApplicationLoader.createContext(
        new Environment(
          new File("."), ApplicationLoader.getClass.getClassLoader, Mode.Test)
      )
    )
  }

  "Application" should {

    "send 404 on a bad request" in {
      val Some(wrongRoute) = route(FakeRequest(GET, "/boum"))

      status(wrongRoute) mustBe NOT_FOUND
    }

    "render the index page and tell us about the first product" in {
      val Some(home) = route(FakeRequest(GET, "/"))

      status(home) mustBe OK
      contentType(home) mustBe Some("text/html")
      contentAsString(home) must include ("Your new application is ready. The name of product #1 is Mocked Chair.")
    }
  }

}

</pre>

Of course, both specs will only pass if we change the behaviour of the Application controller in file app/controllers/Application.scala and make us of the injected repository:

package controllers

import models.ProductModel
import play.api._
import play.api.mvc._
import repositories.Repository

class Application(productsRepository: Repository[ProductModel, Int]) extends Controller {

  def index = Action {
    val product = productsRepository.getOneById(1)
    Ok(views.html.index(s"Your new application is ready. The name of product #1 is ${product.name}."))
  }

}

And that's it. At commit 796b6c6, (see the diff), we have a working Play 2.4 app with a compile time injected Cassandra repository.

Video: Die Arbeitswelt bei Galeria.de

2016-01-15T00:00:00+00:00

Gemeinsam mit dem GALERIA Kaufhof Personalmarketing sind die folgenden vier Videos entstanden, die die Arbeitswelt in IT und Produktmanagement bei Galeria.de präsentieren und Einblick geben in unsere Organisation und Kultur.

Das Hauptvideo gibt einen Überblick, und in den weiteren Videos geht es im Detail um die Arbeit als Interface Developer, Product Owner, und Leiter der technischen Plattform.

Die Architektur der Galeria.de Plattform im Kontext der Produktentwicklungsorganisation

2015-12-15T00:00:00+00:00

Über diesen Artikel

Im Kontext dieses Blogs stellt der vorliegende Artikel ein Update und eine Erweiterung des Beitrags von September 2014 dar.

Schon damals existierte ein klar definiertes Set an Vorgaben, welches den Rahmen für Makro- und Mikroarchitekturfragen gesteckt hat und das Projekt in Hinblick auf Fragen der System- und Softwarearchitektur leitet.

In den vergangenen Tagen haben wir begonnen, ausgehend von den Erfahrungen bis heute und unserer jetzigen Perspektive, einige der Grundlagen unserer Architektur noch einmal neu aufzuschreiben.

Anstoß hierzu lieferte unter anderem der Launch von http://scs-architecture.org/, einem Portal, welches das Konzept der Self-contained Systems, die den zentralen Baustein auch unserer Architektur bilden, präsentiert.

Inhaltlich haben wir das Konzept SCS seit langem gelebt, aber semantisch war der Ansatz auf unserer Architekturlandkarte nicht klar verortet. Mit der Überarbeitung haben nun alle zentralen Bausteine einen klaren Platz und einen klaren Namen.

Bei einem Projekt, welches seit nunmehr fast 2 Jahren läuft und seit fast einem Jahr im Betriebs- und Weiterentwicklungsmodus ist, wird auch die Frage des fachlichen Wachstums spannend. So klar die bestehende Struktur ist

was sind die architektonischen Leitlinien, wenn der fachliche Themenumfang wächst und die Plattform sich inhaltich weiterentwickelt? Schliesslich stehen wir noch am Anfang unserer Mission, MCR-Marktführer in Europa zu werden.

Zusätzlich klingt in diesem Dokument auch das Verhältnis zwischen Produktarchitektur und Produktentwicklungsorganisation stärker an (ohne dabei den Anspruch zu erheben, die Aufbau- und Ablauforganisation der Galeria.de Produktentwicklung vollumfänglich aufzuzeigen - dies muss im Zuge anderer Beiträge erfolgen).

Die Visualisierung

Einen Überblick über die verschiedenen Architekturkomponenten und ihr Verhältnis zueinander soll das folgende Schaubild ermöglichen:

Grundlagen der Architektur

Zwei Grundideen bilden das Fundament der architektonischen Strukturierung: Eine vertikale Orientierung der High-Level Komponenten in sogenannten Self-contained Systems, und eine fachlich motivierte Trennung und Gruppierung dieser Komponenten in sogenannten Domänen.

Das Verhältnis von Domänen zu Systemen ist wie folgt: eine Domäne liegt immer dann vor, wenn ein oder mehrere Systeme einen logisch zusammenhängenden Ausschnitt der fachlichen Use-Cases eines Benutzers vollumfänglich abbilden. Konkretes Beispiel: die Domäne SEARCH bei Galeria.de umfasst diejenigen Systeme, welche von der Benutzeroberfläche bis zur Datenhaltung das Suchen und Finden von Produkten für den Benutzer von Galeria.de ermöglichen.

Der Domäne SEARCH ist also mindestens ein System zugeordnet, welches sowohl die Weboberflächen-Elemente (wie zum Beispiel die Suchbox mit Auto-Complete, Suchergebnisseite usw.) bereitstellt, als auch den Import von Produktdaten und deren Überführung in eine spezialisierte Such-Datenbank implementiert.

Ein System wiederum ist eine Sammlung von Anwendungen mit gemeinsamer Datenhaltung, welche in unserem Fall auch alle innerhalb desselben Code Repositories liegen und auch gemeinsam deployed werden - bei einer Scala Domäne könnte es sich also konkret um ein sbt Multiprojekt bestehend aus einer Play2 Anwendung für das Webinterface und zusätzlich Anwendungen auf Basis von Akka für die Hintergrundverarbeitung von Daten handeln. Und unsere Ruby Domäne CONTROL wiederum betreibt ein System, welches sich intern stark in Richtung einer Microservice-basierten Struktur entwickelt hat und als gemeinsame Datenhaltung unter anderem einen MessageBus-orientierten Ansatz verfolgt.

Miteinander sprechen diese Systeme - innerhalb einer Domänengrenze und darüber hinaus - nur über definierte Schnittstellen (da sie keine Daten teilen dürfen), und dies unter Vermeidung von verteilten Callstacks.

In gewissem Sinne wird hier das bekannte Paradigma von loser Kopplung und hoher Kohäsion, welches klassischerweise auf Ebene eines einzelnen Softwaresystems betrachtet wird, auf einer höheren Ebene fortgesetzt.

Die Kohäsion entsteht, weil fachlich verwandte Themen vereint werden in den Self-contained Systems einer Domäne. Die lose Kopplung wird abgebildet dadurch, dass die verschiedenen Systeme nur über Schnittstellen miteinander kommunizieren.

Damit gilt für das Gesamtsystem dieselbe Eigenschaft, die auch innerhalb eines Softwaresystems gilt, welches nach diesem Paradigma entworfen wurde: Änderungen in einer Komponente bedingen nur dann Änderungen in einer anderen Komponente, wenn die Änderungen die Schnittstelle betreffen.

Dies sorgt für hohe Robustheit des Gesamtsystems (ohne verteilte Callstacks und dank Replikation von Daten können andere Systeme weiter operieren, auch wenn ein angebundenes System nicht-verfügbar wird), ermöglicht weitgehend autarkes Arbeiten pro Domäne (nicht zuletzt in Hinblick auf die Releasefrequenz), bietet die Möglichkeit, Systeme nach ihren unterschiedlichen Anforderungen auch unterschiedlich zu skalieren, und erlaubt eine in Hinblick auf die erforderliche Funktionalität passgenaue Wahl der Technologien pro System. Weitere Informationen hierzu liefert scs-architecture.org.

Das Konzept der Domäne ist weiterhin der Brückenschlag zwischen Architektur und Aufbauorganisation. Optimalerweise steht hinter jeder Domäne ein Team - in unserem Fall ein Scrum-Team - welches von Anforderungsmanagent über Software-Entwicklung und QA bis hin zum Betrieb die Domäne mit ihren Systemen fachlich und technisch vollumfänglich “owned”.

Domänen und ihre Systeme

Warum dann noch die Unterscheidung zwischen Domäne und System? Warum nicht 1 Domäne gleich 1 System? An dieser Stelle findet derzeit eine Evolution unseres bisherigen Modells statt, in dem die Begriffe bisher oft synonym verwendet wurden.

Die Grund für eine Unterscheidung ist, dass ein Komponentenschnitt einerseits fachlich motiviert sein kann, andererseits technisch. Eine rein technische Motivation führt hierbei zu einem neuen System, eine fachliche Motivation zu einer neuen Domäne. Der Begriff der “technischen Motivation” ist allerdings recht weit gefasst - es muss nicht zwangsläufig die Einführung einer neuen Technologie (Programmiersprache, Framework, Datenbanksystem usw.) vorliegen: selbst bei gleichbleibendem Technologiestack kann es die technische Motivation geben, eine weiterhin saubere Codebase gewährleisten zu wollen oder feingranularer releasen zu wollen.

Für beide Motivationen gibt es derzeit Beispiele im Projekt. Das Team der bestehenden Domäne EXPLORE, welches sich bisher vornehmlich um Teaser und Störer im Shop kümmert, soll in Zukunft die Verantwortung übernehmen für die Infrastruktur von Inhalten auf Galeria.de, die nicht direkt mit dem Shopping-Erlebnis des Kunden zu tun haben, beispielsweise Presseseiten und Unternehmensinformationen sowie Inhalte rund um das Recruiting.

Abgesehen davon, dass hier auf Basis eines Open Source CMS auch technisch eine neue Lösung entsteht, wird schnell klar, dass die bisherige EXPLORE Fachlichkeit verlassen wird. Daher begründet das Team derzeit eine neue Domäne CONTENT. Auch wenn hier vorerst dieselben handelnden Personen an Bord sind, sehen wir das zu behandelnde Thema aufgrund der andersartigen fachlichen Ausrichtung als neue fachliche Einheit.

Anders im Team ORDER, welches sich um alle Fachlichkeiten rund um das Thema Bestellungen kümmert. Hier zeichnet sich ab, dass es sinnvoll sein könnte, die Webshop-orientierten Aspekte des Bestellens von der nachgelagerten Bestellverarbeitung zu trennen - sinnvoll zum Beispiel vor dem Hintergrund der möglicherweise sehr unterschiedlichen Skalierungsanforderungen für den Shop einerseits und die nachgelagerte Verarbeitung von Bestellungen andererseits; weiterhin ist zu erwarten, dass eine Entkopplung dieser beiden Aspekte auch in Hinblick auf Releasezyklen und Sauberkeit der lokalen Architektur Vorteile bringt. Self-contained Systems sind stets Monolithen - das ist nicht per se negativ, aber jeder Monolith kann irgendwann zu groß werden; “zu groß” ist eine sehr subjektive Eigenschaft, aber ein Maßstab ist sicherlich die mittlerweile weitverbreitete Formulierung “passt nicht mehr vollständig in den Kopf eines Teammitglieds”.

Systeme und Umgebungen

Das Schaubild spricht weiterhin von Umgebungen. Gemeint ist damit der Kontext, in dem die Anwendungen von Systemen laufen können. Aufgrund der vertikalen Orientierung von Systemen ist der Ausführungskontext ihrer Anwendungen potentiell verteilt: Eine Play2-basierte Scala Anwendung wird auf den Servern von Galeria.de ausgeführt, also in der sogenannten Plattformumgebung - diese Backend-Anwendung liefert aber vielleicht eine Single-Page Application oder eine andere Form von JavaScript-Anwendung aus; diese läuft dann im Browser des Benutzers, also in dessen Umgebung.

Weiterhin sprechen viele Systeme der Plattformumgebung mit Fremdsystemen, im Falle von Galeria.de beispielsweise mit Systemen der Warenwirtschaft. Diese werden ausgeführt in einer Fremdumgebung, also einer Umgebung die technisch und organisatorisch außerhalb der Kontrolle von Galeria.de liegt.

Schnittstellen

Der Verzicht auf eine gemeinsame Datenhaltung bedingt klar definierte und zuverlässig arbeitende Schnittstellen. Leitbild ist für uns das World Wide Web, weshalb wir auf HTTP als Transportprotokoll setzen und unsere Schnittstellen nach den Prinzipien von REST gestalten.

Wir unterscheiden hierbei 4 Typen von Schnittstellen:

Typ “Web”
- baut auf HTTP auf
- RESTful
- HATEOASful
- benutzerorientierter Media-Type
- klassischerweise die von einer Backend-Anwendung generierte, HTML-basierte Webseite (oder Webseiten-Elemente, welche durch die Frontend-Integration per SSI eingebunden werden)
Typ “REST”
- baut auf HTTP auf
- RESTful
- HATEOASful
- maschinenorientierter Media-Type
- klassischerweise der durch eine Backend-Anwendung bereitgestellte JSON-Webservice, welcher von JavaScript Anwendungen, Mobile Apps, oder Fremdanwendungen genutzt wird
Typ “Multipart”
- baut auf HTTP auf
- RESTful
- maschinenorientierter Media-Type
- Auf HTTP multipart basierende Feed- oder Snapshot-Schnittstelle für die Synchronisation von Massendaten und für Event-Sourcing
Typ “Other”
- Schnittstellen, die nicht den anderen Typen entsprechen: FTP, SOAP, usw. Unterschiedlichste Transportprotokolle und Medientypen sind denkbar.

Zusammenfassung

Der Architekturansatz von vertikal orientierten Self-contained Systems gepaart mit fachlich geschnittenen Domänen und HTTP-basierten Schnittstellen bildet den Rahmen für alle Produktentwicklungsinitiativen von Galeria.de.

Dieser Rahmen ermöglicht optimale Kundenorientierung durch fachliche Spezialisierung in den Domänenteams, ein robustes Gesamtprodukt dank der Entkopplung von Datenhaltung und Callstacks, und eine effektive Weiterentwicklung dank autarker Releaseprozesse und einem auf die Schnittstellen beschränkten Abstimmungsprozess in den Softwareentwicklungsteams.

Entwicklung und Betrieb einer Symfony2 Webanwendung - Teil 1

2015-10-27T00:00:00+00:00

Über diesen Artikel

Vor kurzem standen wir vor der Herausforderung, eine kleine Onlineanwendung für eine zeitlich begrenzte Rabattaktion zu realisieren, die keinerlei Verbindung mit dem Galeria.de Webshop hatte.

Während der Technologiestack rund um unseren Onlineshop auf Scala, Ruby und Casssandra basiert, wurde hier die Entscheidung gefällt, die Anwendung außerhalb unserer bestehenden Dienste und Systeme zu realisieren, und auch nicht im Kontext unserer Scala und Ruby Teams, mit dem Ziel den normalen Produktentwicklungsprozess nicht mit diesem Sonderprojekt zu “stören”.

Weiterhin waren hier die für den Online-Shop geltenden Skalierungsanforderungen, die ein zentraler Treiber hinter den technologischen Entscheidungen unseres Hauptstacks sind, nicht von Belang.

Da die Deadline für dieses Projekt sehr knapp gesteckt war, hat man sich der technologischen Lösung sehr pragmatisch genähert. Es fiel die Entscheidung, die Anwendung mit dem PHP-basierten Symfony2 Framework und MySQL als Datenbank zu bauen. Diese Kombination ist sehr gut etabliert und hat sich als genau die richtige Wahl für diese Art von Projekt herausgestellt.

Ich möchte dieses Projekt aus der realen Welt heranziehen um den Leser durch all jene Details des Produktentwicklungsprozesses zu führen, die eine relevante Rolle spielen im Zusammenhang mit dem Schreiben und Betreiben von Anwendungen auf Basis von Symfony2 - hierbei gehe ich ein auf Aspekte wie Projektsetup, Testing, Datenbankmigrationen, Continuous Delivery, Sicherheit, und vieles mehr.

Der hierzu gewählte Ansatz hebt alle signifikanten Entscheidungen hervor, erklärt die Implementationsdetails die sich aus diesen Entscheidungen ergaben, und diskutiert die Vor- und Nachteile dieser Entscheidungen. Ich werde weiterhin diejenigen Teile der Anwendungen herausstellen, die weiter verbessert werden könnten.

Zielgruppe

Dieser Beitrag richtet sich an PHP-Entwickler, die mindestens erste Erfahrungen in der Arbeit mit Symfony2 haben, und für die bspw. die Arbeit mit Composer bekanntes Terrain ist.

Die Anforderungen

Mit Wirkung zum 1. Oktober 2015 wurde die GALERIA Kaufhof GmbH Teil der Hudson’s Bay Company. Zuvor waren wir Teil der METRO GROUP. Vor Abschluss dieser Transaktion wurde eine allerletzte Rabattaktion für unsere nunmehr ehemaligen Kolleginnen und Kollegen der METRO aufgesetzt: Good Buy METRO.

Der Use Case sah wie folgt aus: Für eine begrenzte Zeit konnten sich Mitarbeiterinnen und Mitarbeiter bestimmter METRO-Tochterunternehmen über die hier beschriebene Webanwendung für die Rabattaktion registrieren, basierend auf ihrer Mailadresse und Personalnummer. Nach Abschluss der Registrierung erhielt jeder Benutzer eine Mail mit einem PDF-Anhang, auf dem insgesamt sechs personalisierte Gutscheine abgedruckt waren. Jeder Gutschein enthielt einen QR Code, der an der Kasse einer unserer Filialen eingescannt werden konnte, um den Rabatt auf den Einkauf zu erhalten.

Im Kern lauteten die funktionalen Anforderungen daher:

Erlaube Zugriff auf eine Webanwendung
Ermögliche über die Webanwendung eine Registrierung auf Basis von Mailadresse und Personalnummer
Verifiziere die Gültigkeit der Personalnummer über einen internen Prozess, sowie die Gültigkeit der Mailadresse über ein Double Opt-In Verfahren
Wähle für jeden verifizierten Benutzer aus dem Pool aller Rabattcodes sechs freie Codes aus
Erstelle für jeden dieser Codes einen QR Code
Erstelle auf Basis der sechs QR Codes ein PDF Dokument für den Benutzer und sende es ihm per Mail

Hinzu kamen nicht-funktionale Anforderungen. Um in der kurzen Projektphase stets zeitnah und zuverlässig auf Detailänderungen in den funktionalen Anforderungen reagieren zu können, war eine hohe Testabdeckung erforderlich. Weiterhin sollten Änderungen immer umgehend in der Produktionsumgebung verfügbar sein, damit eine enge Feedback-Schleife mit den Anforderern möglich war. Dies wiederum bedingte eine vollautomatische Continuous Delivery Pipeline, und eine der Voraussetzungen hierfür war der Einsatz von Datenbank-Migrations.

Da das System personenbezogene Daten speichern würde, wurde eine externe Sicherheitsüberprüfung eingeplant, und diese mit einem guten Ergebnis zu bestehen war eine weitere Anforderung. Zusätzlich war das Thema Laststabilität im Fokus - zwar wurde die Anwendung nur einem begrenzten Nutzerkreis zur Verfügung gestellt, aber da es sich um eine zeitlich eng begrenzte Sonderaktion handelte, war ein gewisser Ansturm zu Beginn der Aktion zumindest möglich. Daher wurde auch ein Lasttest eingeplant mit der Anforderung, dass die Webanwendung auch bei vielen parallelen Zugriffen gute Antwortzeiten lieferte.

Eine weitere nicht-funktionale Anforderung war, dass die Anwendung auch auf mobilen Geräten angenehm zu bedienen sein sollte.

Die Umsetzung

Aufsetzen des Projekts

Die README des Projekts auf GitHub bietet einen Leitfaden zur Einrichtung eines Mac OS X Systems als Entwicklungsumgebung für die Anwendung.

Der erste Schritt in der Entwicklung war das Anlegen eines neuen Symfony2 Projekts. Ich entschied mich für die aktuelle stabile nicht-LTS Version von Symfony, zum damaligen Zeitpunkt 2.7.3. So verständlich ich die Idee von Long Term Support Versionen finde, ziehe ich dennoch vor, lieber immer mit einer aktuellen stabilen Version zu arbeiten und auch immer zeitnah (vielleicht nach 2-3 minor releases) auf eine neue stabile Version upzugraden, wenn diese verfügbar wird.

Meiner Meinung nach läuft man ein eine Falle wenn man zu lange auf einer älteren Version verharrt, ein Vorgehen, welches durch LTS Versionen begünstigt wird. Man verliert einfach den Anschluss und ein Wechsel, der ja irgendwann erfolgen muss, wird immer furcheinflößender, komplexer und teurer. Lieber regelmäßig durch einen kleinen Schmerz gehen (der bei guter Testabdeckung eh überschaubar ist) und nicht in die Falle laufen, irgendwann ein Legacy-System zu haben. Für mich ist dieses Vorgehen ein Beispiel für das agile Prinzip If It Hurts, Do It More Often - guten Lesestoff bietet hier zum Beispiel Martin Fowler in FrequencyReducesDifficulty.

Wie unter Installing and Configuring Symfony beschrieben wurde der Symfony Installer heruntergeladen und installiert, um dann mittels symfony new goodbye-metro 2.7.3 das Projekt aufzusetzen.

Symfony2 ist die Basis der Anwendung im Backend, aber eine Webanwendung hat auch ein Frontend, und auch dieses will z.B. in Hinblick auf externe Bibliotheken und Frameworks gemanaged werden. Hierzu wurde Bower, der JavaScript Paketmanager, benutzt. Über die Datei bower.json im Hauptverzeichnis des Projekts wurde Bootstrap als Abhängigkeit definiert:

{
  "name": "goodbye-metro",
  "version": "0.0.1",
  "dependencies": {
    "bootstrap": "~3.3.5"
  }
}

Um Bower und Symfony2 sinnvoll zu integrieren ist es wichtig dafür zu sorgen, dass Bower seine Bibliotheken im richtigen Zielverzeichnis ablegt. Der Symfony Best Practice folgend, sollte die Anwendung im Bundle AppBundle entstehen. Die öffentlichen Webdateien für dieses Bundle gehören in src/AppBundle/Resources/public - über das Assetsystem von Symfony wird dieser Ort nach web/bundles/app gespiegelt, und von dort können die Dateien vom Webserver geserved werden. Da wir mit Bower externen Code in unser Projekt holen (analog zu den externen PHP Libraries, die mittels Composer in vendor im Wurzelverzeichnis des Projekts landen), macht es Sinn auch diese in einem vendor Ordner abzulegen, um sie nicht mit internen Frontend-Dateien zu vermischen.

Um dies zu erreichen, wurde die Datei .bowerrc mit folgendem Inhalt angelegt:

{
  "directory": "src/AppBundle/Resources/public/vendor",
  "interactive": false
}

"interactive": false ist nützlich, um Bower ausführen zu können ohne dass Eingaben an der Kommandozeile abgefragt werden.

Wichtiges Detail: die externen Bibliotheken, die per Bower gemanaged werden, sollen nicht Teil des git Repositories werden. Daher wurde die Zeile src/AppBundle/Resources/public/vendor zur .gitignore Datei hinzugefügt.

Die Dependencies der PHP Welt hat der Symfony Installer automatisch nach vendor/ heruntergeladen. Für die Frontend Dependencies müssen wir mittels Bower selber tätig werden:

bower install

Migrations als Grundlage für Continuous Delivery

Damit war nun ein Grundgerüst für die zu bauende Anwendung, sowohl in Hinblick auf das Backend als auch das Frontend, verfügbar. Aber dieses Grundgerüst musste noch erweitert werden, um den zukünftigen Anforderungen gerecht zu werden.

Ein ganz zentrales Element für das Erreichen einer Continuous Delivery sind Datenbankmigrationen. Statt händisch Schemaänderungen vorzunehmen, sind Veränderungen an der Struktur einer Datenbank abgebildet in Codedateien, die Teil des Projektrepositories sind wie anderer Code auch. Das Schema der Datenbank ist somit einerseits versioniert, andererseits können Schemaänderungen ohne menschliches Zutun durchgeführt werden.

Ist dieses Verfahren aufgesetzt, kann neuer Code automatisiert auf die Produktionsumgebung ausrollen, selbst wenn dieser Code eine veränderte Datenbankstruktur erwartet - im Zuge des Ausrollens wird die Datenbank automatisch auf die Struktur angepasst, die der neue Code erwartet.

Datenbankmigrationen sind in Symfony2 Projekten sehr leicht zu realisieren, da hierfür ein entsprechendes Bundle existiert. Um dieses zu installieren (und automatisch zu den Composer-verwalteten externen Abhängigkeiten hinzuzufügen), reicht folgender Aufruf:

composer require doctrine/doctrine-migrations-bundle "^1.0"

Das neue Bundle musste nun dem Kernel der Anwendung bekannt gemacht werden, indem app/AppKernel.php um den Eintrag

new Doctrine\Bundle\MigrationsBundle\DoctrineMigrationsBundle()

erweitert wurde, und die folgenden Konfigurationsparameter mussten in app/config/config.yml hinzugefügt werden:

doctrine_migrations:
    dir_name: "%kernel.root_dir%/DoctrineMigrations"
    namespace: Application\Migrations
    table_name: migration_versions
    name: Application Migrations

Um nun zu ersten Migrations zu kommen machte es Sinn, eine erste Entität zu schaffen und die Erzeugung der dazugehörigen Datenbankstruktur in einer ebensolchen Migration abzubilden. Symfony2 bietet alle Hilfsmittel um diesen Weg nicht komplett zu Fuß gehen zu müssen.

Der naheliegenste Kandidat für diese erste Entität war der Nutzer der Rabattaktion, intern Customer genannt - die Namensgebung User oder, da es sich grundsätzlich im Konzernmitarbeiter handelte, Employee, wäre sicherlich ebenfalls möglich gewesen.

Über php app/console doctrine:generate:entity erfolgte die interaktive Erzeugung der Entität Customer. Das Ergebnis sieht man unter src/AppBundle/Entity/Customer.php auf GitHub.

Direkt mit Bordmitteln gelangt man von der neuen Entität zur zugehörigen Migrations-Datei: php app/console doctrine:migrations:diff stellt die Unterschiede zwischen dem Code (dem die Customer Entität bereits bekannt ist) und der Datenbank (die noch keine zugehörige Tabelle kennt) fest und legt unter app/DoctrineMigrations/ eine Datei mit entsprechenden SQL Statements an (siehe app/DoctrineMigrations/Version20150828083456.php auf GitHub).

Um die Datenbank nun mit dem Code zu synchronisieren, führt man schlicht php app/console doctrine:migrations:migrate aus.

Hierbei sollte man beachten, dass neu erzeugte Migrations immer sofort angewendet werden sollten, bevor man weitere Veränderungen an Entitäten vornimmt. Der diff Befehl ist sehr gut darin zu erkennen, was die Unterschiede zwischen Entitäten und Datenbank sind, aber er kann nicht berücksichtigen, welche unangewendeten Migrations bereits existieren. Führt man zum Beispiel nach dem Erzeugen der Entität den diff Befehl zwei Mal direkt hintereinander aus, dann erhält man zwei Migrationsdateien, die aber bei den den gleichen Inhalt haben (Anlegen der Tabelle für die Entität), und ein Ausführen von migrate würde fehlschlagen wenn nach Anwenden der ersten Migrationsdatei die Anwendung der zweiten versucht, die soeben erstellte Tabelle noch mal anzulegen.

Motivation für Continuous Delivery

Warum eigentlich noch mal das Ganze? Das Ziel ist die Schaffung und Nutzung einer Continuous Delivery Pipeline, und Migrations sind neben Tests ein notwendiges Mittel zum Zweck.

In der Softwareentwicklung bei Galeria.de ist Continuous Delivery ein sehr zentraler Baustein unseres Produktentwicklungsprozesses, deshalb wurde auch bei diesem sehr kleinen Sonderprojekt Wert darauf gelegt.

Wer Software entwickelt, kennt vermutlich das Phänomen: In der eigenen Entwicklungsumgebung funktioniert alles wie gewünscht, aber auf dem Produktionssystem verhält sich die Software anders, und nicht selten fehlerhaft. Oder auch: man ist zu 95% fertig mit dem Projekt, nun muss man es “nur noch” releasen, und stellt fest, dass man nicht 5%, sondern noch 30% des Aufwands vor sich hat, bis man wirklich gelauncht ist.

Das hervorragende Buch Growing Object-Oriented Software, Driven By Tests macht dieses Phänomen auf interessante Weise anschaulich. Würde man eine “Stresskurve” über die Projektdauer plotten die anzeigt, welches Level von Stress oder Chaos im Projekt zu einem beliebigen Zeitpunkt auf dem Weg zum Launch herrscht, dann sieht diese klassischerweise wie folgt aus:

Stress/Chaos                                 
                                             
      ^                                      
      │                                      
      │                            *         
      │                            *         
      │                            *        
      │                           * *       
      │                           * *       
      │                           * *       
      │ **************************   *      
      +───────────────────────────────> Zeit
                                   |
                                Launch

In der Zeit vor dem Launch ist es verhältnismäßig ruhig - man arbeitet vor sich hin, die Anwendung entsteht und lebt in der Entwicklungsumgebung, welche überschaubar und gut beherrscht ist. Dann kommt die Launchphase, und es wird hektisch: in Produktion sind Softwarepakete auf einem ganz anderen Stand, einzelne Nodes in der Entwicklungsumgebung werden zu Clustern mit vielen Nodes in Produktion, Netzwerkrouten funktionieren nicht, bei jedem Deployment ist die Seite einige Minuten lang offline und so weiter und so fort.

Eine Anwendung zu bauen hat aus dieser Perspektive zwei Aspekte: Das Herstellen von Funktionalität, und das Herstellen von Betriebsbereitschaft. Im klassischen Vorgehen liegt während nahezu der gesamten Projektphase der Fokus fast ausschließlich auf dem ersten Aspekt, und die Betriebsbereitschaft kommt zu kurz. Continuous Delivery dreht den Spieß um:

Stress/Chaos                                 
                                             
      ^                                      
      │                                      
      │
      │
      │
      │ *
      │  *
      │   *                        *
      │    ************************ **
      +───────────────────────────────> Zeit
                                   |
                                Launch

Bei diesem Ansatz werden die knackigen Herausforderungen, die Software betriebsbereit zu bekommen und einen funktionierenden und zuverlässigen Releaseprozess sicherzustellen, an den Projektanfang gesetzt (“Do the hard stuff first”). Das ist durchaus anstrengend, denn es dauert in der Regel einen Moment bis zum allerersten Mal die (zu diesem Zeitpunkt in ihrer Funktionalität und Komplexität natürlich noch äußerst rudimentäre) Software sauber bis zu den Produktionssystemen ausrollt - dabei will man doch “richtig loslegen” und Features bauen, statt sich jetzt schon mit dem Aufsetzen von Serversystemen auseinanderzusetzen.

Aber hat man diese Hürde erst einmal genommen, erntet man für die gesamte Lebensdauer der Anwendung, und ganz besonders in der Launchphase, die Früchte dieses Ansatzes. Der finale Release, d.h. das zur-Verfügung-stellen der Anwendung für den eigentlichen Kunden, ist zum Zeitpunkt des Launches bereits dutzende, wenn nicht hunderte Male geübt und eingespielt. Wenn man ein Feature fertiggestellt hat, ist es wirklich fertig: Es liefert die geforderte Funktionalität, und rollt zuverlässig zum Kunden aus, und funktioniert auch im Live-Betrieb. Eine Funktionalität, die letzteres nicht bietet, ist aus Kundensicht exakt so wertvoll wie die vollständige Abwesenheit der Funktionalität.

Continuous Delivery ist dabei natürlich kein singuläres Event - so, wie die Anwendung in der Projektphase in ihrer Funktionalität wächst, wächst auch der Deliveryprozess mit. Hier ist man nicht davor gefeit, ab und zu eine kleine Überraschung zu erleben und nachbessern zu müssen - aber meiner Erfahrung nach lebt es sich deutlich besser mit seltenen kleineren Überraschungen als mit einer großen zum ungünstigsten denkbaren Zeitpunkt, dem Launch.

Erste Tests

Zurück zur Anwendung. Migrations waren nun aufgesetzt, und eine funktionierende Continuous Delivery das nächste Ziel. Um dies nicht ganz im luftleeren Raum zu verfolgen, ging es nun darum, erste Funktionalität zu erzeugen - und die Korrektheit dieser Funktionalität mit einem Testfall zu beweisen, denn die Idee einer Continuous Delivery Pipeline ist ja, dass sie automatisch, ohne weiteren menschlichen Eingriff, die Software auf Produktionssystemen veröffentlicht; da also auch kein Mensch die Korrektheit testet, muss die Korrektheit über automatisierte Testfälle gewährleistet werden.

Zu diesem Zeitpunkt existierte lediglich die Customer Entität, und diese verfügte nicht wirklich über nennenswertes Verhalten, welches sinnvoll zu testen gewesen wäre. Der nächste Schritt war daher die Schaffung eines ersten Testfalls, der relevantes Verhalten der Anwendung überprüfen würde, und der innerhalb eines Delivery-Durchlaufs bewies, dass die Anwendung auf dem Zielsystem erwartungsgemäß funktionierte. Der Fokus lag daher auch auf einem funktionalen Test, und nicht auf einem Unit-Test; Units wie Methoden und Klassen sind in der Regel so isoliert, dass ihr Funktionieren innerhalb eines Testcases wenig darüber aussagt, ob die Anwendung an sich auf dem Zielsystem korrekt läuft - sprich, selbst eine ganze Batterie an fehlerfrei durchlaufenden Unittests sagt mir nicht, ob meine Continuous Delivery eine für den Benutzer korrekt laufende Anwendung zum Ergebnis hat.

Funktionale Tests zu schreiben ist in Symfony2 Anwendungen glücklicherweise sehr einfach und komfortabel. Man testet hierbei zwar nicht auf Basis realer HTTP-Anfragen und -Antworten, aber man testet dennoch die integrierte Anwendung in ihrer Gesamtheit und kann somit sicherstellen, dass sich alle relevanten Komponenten im Zusammenspiel korrekt verhalten.

Um funktionale Tests schreiben zu können, bedurfte es nur wenig Vorbereitung. Zum einen musste PHPUnit als Dependency definiert werden mittels require phpunit/phpunit "^4.8", und eine phpunit.xml.dist musste im Wurzelverzeichnis des Projekts angelegt werden - siehe phpunit.xml.dist auf GitHub für den Inhalt.

Nun kann man über das Schreiben von Testklassen, die Symfony\Bundle\FrameworkBundle\Test\WebTestCase erweitern, funktionale Testfälle erzeugen. Der allererste funktionale Testfall im Projekt, in Datei src/AppBundle/Tests/Functional/RegistrationTest.php, sah wie folgt aus:

<?php

namespace AppBundle\Tests\Functional;

use AppBundle\Tests\TestHelpers;
use Symfony\Bundle\FrameworkBundle\Test\WebTestCase;

class RegistrationTest extends WebTestCase
{
    public function testContents()
    {
        $client = static::createClient();

        $client->request('GET', '/');

        $this->assertEquals(200, $client->getResponse()->getStatusCode());
    }
}

Der Testfall selber ist sehr überschaubar, aber sein Funktionieren beweist, dass die integrierte Anwendung in der Lage ist, korrekt auf einen Request gegen die Route / zu antworten. Wie gesagt werden hierbei keine realen HTTP Requests über eine reale Leitung geschickt - der $client, den man über den WebTestCase von Symfony2 erzeugt, ist lediglich eine clevere Abstraktion, die stets im Kontext der PHP Laufzeit bleibt. Jedoch läuft der Client gegen die vollständig integrierte Symfony-Anwendung, d.h. der Testfall kann nur erfolgreich sein, wenn Dependencies, Konfiguration, Routing, Controller, Datenbank usw. richtig funktionieren und zusammenspielen. Für das angestrebte Ziel ist dies völlig ausreichend.

Ausgeführt wird dieser Testfall nun schlicht mittels php ./vendor/phpunit/phpunit/phpunit.

An diesem Punkt war ein wichtiger erster Zwischenstand erreicht: Die Anwendung war grundsätzlich aufgesetzt, Veränderungen an der Datenbank waren dank Migrations codeseitig steuerbar, und die notwendigen Strukturen für einen ersten Testcase waren in Stellung gebracht. Mit anderen Worten: Das erste Paket für die Delivery war geschnürt - nun brauchte es die Pipeline zum Produktivsystem, über die das Paket geliefert werden konnte.

Im demnächst erscheinenden Teil 2 dieser Serie wird der Aufbau der Continuous Delivery Pipeline in allen Details beleuchtet.

A “frontend middleware” on top of a shared-nothing architecture

2015-07-28T00:00:00+00:00

The challenge

Before I explain what exactly a “frontend middleware” is, let me first tell you why we invented it and why it might make sense for you to use one, too.

About one year ago, when I joined the e-commerce team at GALERIA Kaufhof working on their new webshop platform, everything was still in its early stages. The idea of having totally separated application “verticals” was the crazy new thing and the entire system was cut into loosely coupled services to keep the business logic separated. From a backend and business perspective this perfectly made sense.

See our Jump - Ein Technologie-Sprung bei Galeria Kaufhof post (in German) for background information on the new webshop platform architecture.

But I realized very quickly that - from a frontend view - having five entirely separated functional teams working on things like integrating tracking solutions, affiliates or any other third-party content wasn’t very efficient. Five independent frontend devs had to individually learn tracking APIs, implement tracking snippets, manage things like accounts within their own application’s context, and so on. You should get the point. Sounded like the same job done multiple times by multiple people.

Wait, what’s the problem again?

If you are not a frontend developer you might wonder why we didn’t simply embed tracking and retargeting pixels into the markup, just as the vendors tell us. Let me explain. Directly embedding third-party code may introduce two major problems: performance issues and script errors. You might know that all script tags in the markup are synchronously loaded and blocking the pageload by default. Although most vendors switched to asynchronously loading tags (using the async attribute), synchronous loading can cause enormous performance penalties if you’re not aware of the problem. Additionally there is always a risk that broken third party scripts cause Javascript errors, which - in the worst case - break and halt your entire script logic.

Besides the common performance issues and risk of errors, directly embedding third parties is also horribly inefficient and unscalable. Agreed - for one single developer, placing one global pixel in the head of the outmost template in his blog, such approach might be sufficient. But as soon as you start scaling, things become more and more unmaintainable. Imagine you split the application frontend among five or more teams and try to include 10 to 20 pixels. Each team needs to have a story (or at least some sort of task or ticket if you’re not into Scrum) for adding, editing or removing every single pixel. Add project management costs on top and it quickly becomes really expensive.

Even worse - having no dedicated owner for the integration of third parties means there is nobody to ensure that the technical integration is done correctly or in any way consistent among the teams. Not to mention that, in the worst case, you need a complete application deployment (involving all verticals) to update or change a single pixel. I hope this illustrates the problem.

The “data layer” approach

Working with tag management systems before, I got used to what I’d call the “data layer” approach pretty well. To put it simple: the website renders some kind of interesting business information (e.g. product attributes or shopping basket contents) into its page body. Often this is done using some Javascript API. The tagmanager software then takes this information and hands it over to third-party tools like analytics, affiliates or alike. A very simple example, illustrating the concept:

// common example of a "data layer"
var DataLayer = window.DataLayer || [];
DataLayer.push({
  product:{
    id: 1234567890,
    name: "Book",
    price: 19.95
  }
});

This technique felt like a sensible solution for our architecture of rather strictly separated components. Each vertical application could independently render data into the markup, and some frontend logic would consume this data and take care of the rest (i.e. dispatching data to third parties). While thinking about it, we wanted to take it even one step further and do fancy things like declarative tracking, channel recognition and tag management within our own context.

Meta tags to the rescue

From a technical perspective things were pretty obvious. We needed some API that the application could utilize for transporting information into the client space. Also, the implementation had to be completely generic and shouldn’t force any of the structural decisions (i.e. “verticals”) upon the API. An additional scripting API to call from within custom controllers would be nice, but wasn’t a top priority.

Being modern frontend guys, we always avoid inlined script tags whenever possible. So we decided to not rely on Javascript for passing the data around. Instead, we are using dedicated metatags containing JSON data for handing over information from the markup to our frontend middleware (which we decided to call “Data Abstraction Layer”, or DAL). An example metatag could look like this:

<!-- example for a page data object -->
<meta name="gk:dal:data" content='{
  "page":{
    "type":"homepage",
    "name":"Startseite"
  }}' />

This also gave us the important advantage that each vertical application could independently render any kind of data anywhere into a page’s markup (even within SSI contexts) without worrying about availability or load order of a library or API. Markup reliably loads before any scripts. For asynchronously loaded content we developed another, Javascript-driven solution based on broadcasting events top-down from the DAL to its plugins. I’ll talk about that later.

Plugins, Rules and Repos

Behind the scenes and concepts the DAL is pretty lightweight and straightforward. From a technical perspective it is just an advanced plugin loader, lazy loading its plugins using requirejs. On initialization it collects all metatags from the page, aggregates the contained data into one large object, loads its plugins depending on supplied rules and hands over data to each of them. The more specific magic is then going on inside each plugins’ code.

The plugins are standard AMD-modules returning a class with at least a constructor and a handleEvent method. When instantiated, the constructor receives a reference to the DAL module, the aggregated page, and an optional configuration object. The handleEvent method does all the magic required for handling asynchronous events happening within a page’s lifecycle. A simple stub plugin could look like the following code (this time in Coffescript):

define "gk/lib/dal/demoAffiliate", ["thirdpartylib"] (thirdpartyLib) ->
  
  # A simple demo plugin. Just provides a class container with example plugin logic
  # @implements IDALService
  class DemoAffiliate
    
    # init our third party lib
    constructor: (@dal, @data, @config)
      thirdpartyLib.init("someaccountid")
    
    # handle async events
    handleEvent: (name, data, domain) ->
      if name is "addtocart":
        # notify my third-party backend about this event
        thirdpartyLib.notify("addtocart", data.product.id)

The plugin loading is based on a simple ruleset which, optionally, executes callbacks before loading a plugin. That allows for complex rule logic, which no tagmanager could offer out-of-the-box (e.g. if the page’s type is “checkout-complete”, the user is logged in, and has more than 3 articles in his basket). This even enables us to eventually replace our external tagmanager and host the entire affiliate integration right within our own git repository. If you are a developer and/or you ever worked with external tag management GUIs (or any other, less comfortable form of affiliate integration) you might know what a great relief that is. We are even able to fully unit test our affiliate pixels, integrated within our CD pipeline.

Going async

Most modern websites don’t involve a new pageload for every action. Actions like e.g. opening a layer, expanding some accordion or using the off-canvas navigation may happen anytime, asynchronously, without our DAL ever being notified. For such cases we developed the DAL.broadcast mechanism. It offers a simple, one-way message API that allows sending event notifications directly to the DAL. Whenever a script causes an asynchronous action that should be globally broadcasted, the DAL.broadcast method can be called with the specific event name and an optional information object:

// broadcast a client event to the DAL
DAL.broadcast("product-addtocart", {
  id: "1234567ABC",
  name: "FABIANI Jeans",
  price: 19.95
});

While this solved the issue of being notified about asynchronous events it introduced a new problem. We now needed to write specific controllers for any element that should fire an event. Having to write a dedicated controller for any single button actually felt quite ugly and would have caused tons of useless code. So we decided to make the tracking more declarative and introduced custom attributes that allowed to track events without additional controller logic.

We identified three common types of client events we are usually interested in: click/touch, view and focus. These event types can be automatically handled and applied to the appropriate logic using the custom data-attribute syntax data-dal-event-{type}. This would also allow for future extensibility (thinking of swipe, pinchzoom, …). The following examples illustrate the basic principle:

<!-- simple click event without custom data -->
<button 
  name="loginLayerOpen"
  data-dal-event-click='{"name":"layer-login-open"}'
/>

Using this declarative tracking it was now easily possible to apply tracking logic to elements without touching anything but the markup. For more advanced use cases it is also possible to append additional data directly to the event data, following the same type definition and argument signature we use for the DAL.broadcast function itself.

<!-- click event with custom data (a teaser id in this case) -->
<div 
  class="my-teaser"
  data-dal-event-click='{
    "name":"teaser-click",
    "data":{
      "teaserId":"my-fashion-teaser",
      "campaign":"some/fancy/campaign"
    }
  }' 
/>

Standards and conventions FTW

At this point you might ask: “Yeah, sounds nice, but what’s all the buzz about? This doesn’t look like rocket science!” Agreed (mostly). But, that’s only a small part of the cake. The majority of work went into defining conventions and standards for the various types and use cases. Which data do we have to pass to our DAL? What are the globally required fields? How do we name our pages? Which data do we need for which event and/or page? What are the more specific bits of information each vertical had to pass?

The answers to all those questions are very subjective and closely related to the field of business. Obviously, for e-commerce sites you have substantially different information you want to collect, compared to a content-driven online magazine. But in any case it boils down to some key metrics you want to collect and analyze for your business. E.g., a big part of the DAL’s data is passed to our RUM (Real User Monitoring) and BI (Business Intelligence) software, which are also implemented as DAL plugins. Thus, many of our own conventions and metrics are specific to this use case. Another major part is the affiliate and basket tracking integration.

Luckily, I had done most of that long time before, when I initiated the project “tagmanager integration” for our old webshop. So I started writing down the important KPIs and metrics based on that historical data and the tracking-pixels from our tracking solution and our recommendation engine. I summed it all up in a table in our wiki, structured the table based on verticals and added a description for all keys. Then I wrote the integration stories for the vertical teams so they could integrate the right metatags and declarations in the markup. Awesome, or so I thought.

Well, it felt awesome until I looked at the actual implementation done by the teams. The problem was that I had not been very explicit in declaring the types (and especially the formatting) for the individual metrics. I simply defined something like on the product detail page we need a product object with the following fields: name, price, category, …. Even though I also defined some examples, this still led to fairly diverse implementations. Especially the price formatting was a problem, because there were multiple interpretations about how a price should be expressed (e.g."29,90€" vs. 29.9 which are both valid representations of the same price value).

Strong typing to the rescue

Since Javascript (and JSON, too) is a very loosely typed language, we needed to define abstract types (or interfaces if you’re using Typescript) that explicitly define how values have to be supplied to the DAL. This resulted in various fancy types, e.g. DALPageData, DALUserData or DALProductData to just name a few. Let’s look at a part of the type definition for the DALProductData type:

// @interface IDALProductData
DALProductData = {
  
  /**
   * Internal ID of this product.
   */
  "productId" : {
    "type" : "String",
    "mandatory" : true
    }
  },
  
  /**
   * EAN (International Article Number, see https://en.wikipedia.org/wiki/International_Article_Number_%28EAN%29) of this product.
   */
  "ean" : {
    "type" : "String",
    "mandatory" : true
  },
  
  /**
   * Object with price information.
   */
  "priceData": {
    "type": "DALPriceData",
    "mandatory": true,
    "apiVersion": 2
  },
  
  /* ... */

}

As you can see we provide a JSON structure defining the type and some other, optional fields (e.g. mandatory, deprecated, apiVersion and some more). This way we ensure backwards compatibility throughout the API because the teams can safely adapt to new API versions without introducing breaking changes.

When looking at the priceData attribute, you might notice how we used the DALPriceData type to solve the previously mentioned issues with the price formatting. Also, instead of just declaring price-related attributes directly within DALProductData, we defined our dedicated type for generic price information that we use for products, carts, orders and anything else that might come in the future. Here is an excerpt from the DALPriceData type:

// @interface IDALPriceData
DALPriceData = {
  
  /**
   * Current net price of the product or cart; *excluding* VAT, shipping or discount.
   */
  "net": {
    "type": "float",
    "mandatory": true,
    "apiVersion": 2
  },
  
  /**
   * VAT part of 'net' price.
   */
  "VAT": {
    "type": "float",
    "mandatory": true,
    "apiVersion": 2
  },
  
  /**
   * Total price (= net + VAT - discount); *including* VAT and *after* subtracting discount.
   */
  "total": {
    "type": "float",
    "mandatory": true,
    "apiVersion": 2
  },
  
  /* ... */

}

The curious reader might ask: “Hey, what’s the purpose of defining these as Javascript objects?” Well, within the metatags the data is supplied as plain object literals. But in our test infrastructure (based on Selenium) we can now run over the entire page, take the above type definitions and compare them to the actual implementation. If there are mismatches, e.g. because a vertical didn’t implement a type correctly, we raise an error and get a notification.

Conclusion

So, finally, what is a “frontend middleware”? I introduced this term to describe a software architecture that “sits” between client and third-party space and provides a complete abstraction between those two. It replaces common tagmanagers, affiliate pixels and alike. At the same time it provides a declarative tracking API and deep integration with real-user monitoring, web analytics and other frontend-only tools commonly served by third parties (surveyforms, promolayers, onsite A/B testing, etc.). It is designed with verticalized, shared-nothing architectures, distributed over multiple functional teams in mind. Nevertheless it should also play really nice with any classic, monolithic architecture.

Thanks for your attention!

Wait - you actually read this far? Wow. You’re either seriously bored or might be really into this stuff. Did you know we are still looking for talented frontend devs? Get in touch for more information.

GOTOnight Cologne on Microservices at Galeria.de Headquarters

2015-06-22T00:00:00+00:00

On June 22, 2015, Galeria Kaufhof hosted a GOTOnight community event on Microservices. Over 60 attendees listened to three great talks on the topic by Dennis Traub, Stefan Tilkov and Dave Thomas, and enjoyed the great buffet that was provided by Dinea.

The talks were followed by a panel session with all three speakers, with lots of questions from the audience and a lively discussion between Stefan, Dave and Dennis.

We have recorded a video of Dennis Traub’s talk Taming the Monolith - Are Microservices just an implementation detail?. Note, however, that the video was made using the live streaming app Periscope, which seems to only allow videos in portrait mode, and includes comments by people following the stream that are not affiliated with the event itself:

The slides of Stefan Tilkov’s Talk Microservices: Awesome, as long as they are neither ‘micro’ nor ‘services’ are available, too:

The event also sparked quite some activity on Twitter:

Last slide preparations by @stilkov. #GOTOnight #Cologne @galeriakaufhof pic.twitter.com/nL0tTlIyvI
— Manuel Kiessling (@manuelkiessling) 22. Juni 2015

The place is filling up at #GOTOnight Cologne pic.twitter.com/0vLKqNRP6J
— Dennis Traub (@DTraub) 22. Juni 2015

The room’s packed at #GOTOnight Cologne // @GOTOber pic.twitter.com/26aR84ib4Z
— Dennis Traub (@DTraub) 22. Juni 2015

Fantastischer Abend @GOTONights mit @DTraub, @stilkov und @daveathomas! Danke @manuelkiessling
— Wolfram Eberius (@eberius) 22. Juni 2015

Thanks for hosting this cool event #gotonight @manuelkiessling. It was really great to hear @stilkov and @DTraub "live und in farbe".
— Daniel Müller (@dotnetgeek) 22. Juni 2015

@stilkov explains microservices at ou @GOTONights in Cologne 🙌 pic.twitter.com/yXSQeBNrZR
— Freek van Gool (@JavaFreekNL) 22. Juni 2015

@DTraub @stilkov Thanks for the great evening. You gave me some new food for thought. #Microservices #GOTOnight
— Sascha Dittmann ☁ (@SaschaDittmann) 23. Juni 2015

"There's no ubiquitous language!" @DTraub at @GOTONights Cologne #Microservices pic.twitter.com/KYsH4xCeSJ
— Giant Swarm (@giantswarm) 22. Juni 2015

Great evening @GOTONights in Cologne. With talks from @stilkov @DTraub and @daveathomas. Thanks! #microservices
— Thomas Terkatz (@thomasterkatz) 22. Juni 2015

A legend on stage! @daveathomas giving the third talk at the #GOTOnight Cologne pic.twitter.com/JMk6IOSdwl
— Dennis Traub (@DTraub) 22. Juni 2015

Full house in Cologne! /cc @GOTONights @kaesfiehs @stilkov @DTraub @daveathomas pic.twitter.com/zL4RrnHiH8
— Freek van Gool (@JavaFreekNL) 22. Juni 2015

Final panel discussion at our @GOTONights in Cologne. With @stilkov @DTraub and @daveathomas pic.twitter.com/80jpatfs4s
— Freek van Gool (@JavaFreekNL) 22. Juni 2015