I don’t have much experience building these kind of systems (I like Kafka, but I must say I preferNabokov’s rendition of similar ideas in “Invitation to a Beheading” and “Pale Fire” more), buthere’s a couple of things that come to mind.
Even if it is a gigantic complex system with many moving parts, start with spiking and end-to-endsolution which can handle one particular variation of a happy path. Build skeleton first, flesh canbe added incrementally.
To do this, you’ll need some way to actually run the entire system while it isn’t deployed yet,which is something you need to solve before you start writing pages of code.
Similarly, include testing strategy in the specification, and start with one single simpleend-to-end test. I think that TDD as a way to design a class or a function is mostly snake oil(because “unit” tests are mostly snakeoil), but the overall large scale design ofthe system should absolutely be driven by the way the system will be tested.
The code architecture of your solution is going to be isomorphic to your org chart, not to yourdeployment topology. Let’s say you want to deploy three different services: foo, bar, and baz.Just put all three into a single binary, which can be invoked as app foo, app bar, and app
baz. This mostly solves any code duplication issues — if there’s shared code, just call it!
Figure out hard system boundaries between “your system” and “not your system”, and do design thosecarefully. Anything else that looks like a boundary isn’t. It is useful to spend some effortdesigning those things as well, but it’s more important to make sure that you can easily changethem. Solid upgrade strategy for deployment trumps any design which seems perfect at a given momentin time.