Feb 28 | 05 min read
Written by Lalit Bhise & Nikhil Chhaochharia
The last time I worked closely with engineers was long back in 2017. At that time we felt that me heading sales, and engineering apart from being CEO of the whole organization was untenable and we needed more leaders to take responsibility of a few of those tasks. Luckily we found Nikhil who was a great culture fit with great skills in scaling engineering teams. So after handing engineering over to Nikhil, I moved out and focused on business.
As tech pioneers in CPG, we have always been at the bleeding edge of technology. Apart from launching a retailer app to facilitate omnichannel operations back in 2016, we had another large initiative going on which was code named “Eagle Eye”. Eagle eye started as a place where all our research in AI, ML, Computer Vision, big data analytics, industry trends was put together. We have been trying to build some products around the research with varied amounts of success.
Somewhere in June 2020 at the height of Covid induced lockdowns, a few key members of the Eagle Eye team decided to move on. Nikhil asked if I could help with engineering management in Eagle Eye at least for a short while. In my mind, I was scared since I had not managed any engineering teams since 2017, but Eagle Eye is very close to my heart. This is a tech platform that is helping us build a “smarter Bizom” to grow to a $100m and beyond company faster.
After some debate, I said ok and started participating in the daily/weekly meetings with the Eagle Eye squad around July/August 2020. My first decision was to ask the team to make weekly releases from their fortnightly ones. While everyone said ok, the first weekly release happened after 2 weeks. The same happened with the 2nd and 3rd releases. By now all the seniors in the team were rebelling saying doing weekly releases is impossible. Some of them were planning to move on around this time so I started talking with the next level of engineers to understand the underlying real problems.
Following were the main problems highlighted
Then came the dreaded question. Shall we rewrite Eagle Eye? The answer was a unanimous yes. Engineering bravado suggested that it could be done in a sprint or two. More experienced heads suggested that it can surely be done in 6-8 weeks. With that in mind, guys got to work.
This stage was possibly the most important for our re-write. The team put in their suggestions about improving Eagle Eye in a Google form. For every squad, it’s a must that the team feels ownership of the tasks ahead. If you need ownership from the team, you must give them the authority to define the roadmap. This step of asking for suggestions from everyone helped a lot in getting the buy-in. With the feedback as a base, we created sprint plans for execution.
It took us a couple of weeks to have the debates around design but were well worth it. This is where we collaborated. Designing the database restructure was owned by Rohit, Surya, and Utkarsh. The algorithmic redesign was mainly owned by Ankit with support from Surya. API code restructure was owned by Shubham, Sanyam, and Utkarsh.
The biggest debate around the design was regarding which database to use. Some of us were of the opinion that MySQL was best simply because we have a lot of expertise in the team and hence it’s easy to maintain. They also argued that MySQL keeps adding more and more features anyway to help with our roadmap so experimenting with unknowns is not needed. Some others argued that Postgresql had better geospatial capabilities or NoSQL databases like MongoDB were more future-proof. What we did was to have an evaluation criterion among all database options and then choose the best among them. This helped a lot to resolve disputes or a sense of unhappiness towards our eventual choice. I suggest all squads who are debating the right methodology or tools or vendors use a transparent evaluation criterion to resolve the dispute.
Like in life, there is no guarantee that the final decision will be right but the least you can aim for is that the whole team is convinced about the decision.
With the plan in place and design structure in our heads, we were ready to execute.
This was the phase where I started letting the team take more responsibility and lead. They did it really well. The biggest outcome of this phase was that not only we merged our algorithms as a workflow from the original bunch of scripts but now with the new architecture, we could have pan India data refreshed over a weekend. (Earlier, it would take 48 hours for 1 city, pan India was out of the question). The coding phase took longer than we estimated, but it was finally over in 3 sprints. One experiment we did around this time was to allow everyone to code everywhere and break the coding expert myths. We wrote code in multiple modules and tech stacks during that time sometimes even with 0 prior experience in that specific tech. Whenever we were badly stuck, suggestions from Nikhil/Aravinda helped us to move forward. By the end of this phase, we had a team of people where almost everyone could code across the stack. My learning as a leader was to trust the team more in execution and refrain from giving too much Gyan and get external experts involved as needed.
While we finished coding, the team was extremely worried about elevating this to prod. The main concerns were the dependency of the Retailer App squad on Eagle Eye. We were all worried that change in data models along with the complete architecture change may break some of the semantics of the interface between Eagle Eye and the Retailer App. This is where I took the reins in my hand and decided the day as release day. People had to work backwards to meet the release day, to keep all data ready, keep code tested on dev and staging, etc. When the day came, I still remember the standup. Aravinda (our architect), Ankit, Surya, everyone basically said we cannot release since 3 dependencies are still not fixed / we are not sure. Our original design for the rollback was not viable. I was obviously unhappy. This is when Aravinda suggested we create a document on things we will need to do if our deployment failed (rollback checklist). Surya owned up the document. With this new information on how we will roll back if the deployment fails, we geared up for another deployment. The 2nd time, when we started deployment, we again faced a myriad of problems, some due to our lack of preparation, some due to our lack of skills, some purely outside our control. But knowing what all we will have to do if the deployment fails meant we had to push on to complete it. We dreaded rollback knowing how hard it was Sourabh, Suryam Utkarsh, Shubham, and Ankit spent almost 48 hours straight without any sleep/break to ensure that it was successfully done with almost no interface breaking between eagle-eye and any other module dependent on it (retailer app, bourbon, and one view). In a lot of ways that was the night that built the character of the team and its leaders. Writing down what would happen if we failed removed failure as an option entirely
What I learned from trying to be an engineering manager again
Thanks, EE team. Keep rocking whichever squad you guys move on to next. Am sure there is a leader in you!
Lots of love,