Architecture for Growth
Weekly Architecture Insights
Weekly insights on building and scaling engineering teams that thrive
Architecture Decisions Explained
Deep dives into the architectural choices that make or break scaling teams - from monolith to microservices and beyond.
Real-World Case Studies
Learn from actual projects and decisions - the wins, the failures, and the lessons. Every week, a new story from the trenches.
Practical Tools & Frameworks
Decision trees, checklists, and templates you can use immediately. No fluff, just actionable resources for engineering leaders.
About Architecture for Growth
Every week, I share insights from 15+ years of helping startups scale their engineering teams. From architecture decisions to team dynamics, I cover what CTOs and engineering leaders actually need to know.
- Learn how to evolve from a small crew to a high-performing engineering organization with clear processes, effective tooling, and proven frameworks
- Every technical decision should be rooted in strategic goals. Learn to design architectures that reduce complexity, control costs, and accelerate time-to-market.
- Get actionable frameworks that anticipate industry changes, helping you adapt quickly, innovate confidently, and maintain a competitive edge.
Past Experience with
Featured Post: Product Matching & Catalog Deduplication at eMAG
Challenge
eMAG needed to eliminate duplicate product entries from its massive online catalog. Manually identifying duplicates (like multiple variations of the same book or different listings of the same smartphone) was time-consuming and error-prone. The company sought an automated solution that accurately flagged duplicates, even among millions of products and complex attributes.
Approach
By leveraging Apache SOLR for flexible indexing and searching, along with a custom meta-language for defining matching rules, the team processed huge volumes of product data. They applied TF-IDF algorithms to measure textual similarity, introduced advanced techniques like Word2Vec to understand subtle linguistic variations (e.g., “smartphone” vs. “mobile phone”), and iterated rapidly using feedback loops. Instead of starting with a fully automated system, they began by offering human operators top suggestions and refined the rules based on their choices, ensuring continuous improvement.
Technologies & Methods
- – SOLR for indexing and search
- – Schema-less approach to store diverse product attributes
- – Custom meta-language for flexible matching rules
- – Word2Vec for semantic similarity
- – Incremental feedback-driven enhancements
Result
Over time, the solution achieved over 98% accuracy, drastically reducing operator workload and ensuring a cleaner, more reliable product catalog. This resulted in improved customer experience, fewer redundant listings, more trustworthy product matches, and paved the way for scalable, automated catalog management.