One of the classic dilemmas in a tech organization is balancing work on new features and maintaining a healthy codebase. Even with the best efforts, it is rare that a company finds itself without tech debt or legacy software after a few years of existence (and it often doesn’t take years).
There are many ways to try to solve the problem from an organizational perspective: having bug-fixing sprints, teams focused on improving internal tooling, etc. At Dashlane, we choose to have dedicated squads with a strong focus on developer experience and maintainability of the codebase.
But once you find a solution that works for your company, the question remains: what work should you invest in? There is only so much time that can be invested in work which often has no immediate visible revenue impact. And there are generally many areas that could be improved, especially in a mature organization.
So, how can you decide what to prioritize and commit to a result? My recent experience building a roadmap with the Server team at Dashlane has led me to recognize some errors I made as the lead of this project, and how we addressed them as a team to better concentrate our efforts.
1. Choosing is refusing: don’t try to solve all problems at once
When I joined the Server team as a team lead, one of my first missions was to understand the technical state of our platform, and I was tasked by my manager to come up with a roadmap to identify our next priorities. To give you some context, the team is in charge of maintaining all the server-side applications needed to support our different product features, as well as maintaining the associated cloud infrastructure.
I organized workshops with the team to brainstorm about our needs on four architectural principles: scalability, resilience, adaptability, and operability. As expected, we came back from those workshops with plenty of ideas of things we could do better, ranging all the way from our APIs to our infrastructure on AWS and everything in between.
I then came back with five areas of investment and several weeks-long chunks of work within these. In reality, it was a road map for at least 2 years of work given the number of people available to work on it. And with so many areas laid out as priorities, it was impossible to figure how it would be executed.
What happened next is that I tried to spread the team thin to work on each area. That failed. It was also pretty demoralizing for the team to see all of the things we wanted to do yet so little progress on any task.
Only when we chose to address topics sequentially and do some proper prioritization did we manage to start delivering efficiently. It is often hard to admit, but you are not going to finish something faster if you start it sooner yet don’t have the critical number of people to do the work. It is much more efficient to do fewer things at once, but better.
But focusing means leaving aside some topics that are considered important: to properly prioritize, you absolutely need to have a sharp understanding of the outcome of the work and the pain points that will be alleviated with this change. This leads me to the next point: you need to understand the problems you are trying to solve.
2. Uncovering the business outcome behind the technical solution
It is common to see discussions around technical changes centering on the solution rather than the problem.
Because of the fast-changing landscape of software, it is tempting for us engineers to advocate for a particular architecture trend, or new technology, and focus less on the pain point we are trying to solve.
It happened to me when I joined the team. I came from a company where we operated in a rather microservices-oriented architecture. (The term means different things to different people, but Martin Fowler’s description is quite close to what I knew.) This worked quite well for us, and microservices have become quite common place in the industry. I became accustomed with the idea that this type of architecture was the “new normal.” Hence my surprise, when I arrived at Dashlane, to discover that our backend stack was relatively monolithic.
At first, I was convinced that we needed to move to a microservices architecture, but the team was pretty split on the idea. (A lot of article-sharing and debates on microservices vs monoliths ensued.) After a lot of iterations and debates, and thanks to the perspective of an engineer who had joined the team much more recently, we decided to focus on answering the question that really mattered: which problem are we trying to solve?
A new series of discussions within the team uncovered the real pain points: it was hard to modify some parts of the codebase because there were many dependencies, integration tests were painful to write, and work was slowed down as a result.
We realized that the most important thing was to work on enforcing better domain boundaries between our components and isolating dependencies. We didn’t need microservices right now; instead we needed to come back to timeless architectural principles and work on cleaner code organization.
What would the outcome be? Among other things: more efficient onboarding for newcomers, faster development time for people working in separate teams. Certainly objectives that the business could get behind!
Being explicit about outcomes is crucial to get buy-in: the company needs visibility to understand how the money that is not invested on building features today is going to payoff tomorrow.
3. Measuring impact
Not everything is easy to measure, but it is important to give reasonable proof that the work is going to have a tangible impact.
When it comes to improving internal systems, at least two groups of metrics come to mind:
- Service Level Indicators: metrics linked to the level of service we want to provide for users, like uptime, latency… The SRE Book by Google is an unvaluable resource when it comes to understanding how to set good SLIs and which metrics are key
- Metrics linked to internal velocity: time to merge code, frequency of deploys, time to fix bugs…
Putting in place the right monitoring infrastructure is often hard and requires significant engineering effort. In retrospect, one of the mistakes I did was to consider better monitoring as one of the objectives, and not as a prerequisite to everything else. We could have taken easier decisions on what to prioritize if we could have relied on clear numbers for every improvement we wanted to make.
That said, I still don’t believe that everything should be measured. It is also ultimately a matter of experience to be able to recognize a high-performing or unhappy team when you see one. Is it worth investing time to know the average time it takes to merge code if no one in the team is complaining about the velocity of code reviews? Everything takes time, even just aggregating and regularly reporting on a metric. Sometimes, the problems are so obvious that just listening to people will be enough to understand what to prioritize.
- Do fewer things at once, but better
- Understand the business impact of the work you are planning before choosing a solution
- Be sure you can measure impact as much as possible and measure what you can before implementing changes