Barrels of Data

Is LakeDB set to be the evolution of Data Lakehouse?

Data Lakehouse has been one of the go-to data platform architectures in the present day. They offer robust capabilities like ACID transactions, inserts, updates and deletes over massive piles of data, while also providing time travel, schema evolution and query optimization through metadata. Open table formats like Iceberg, Hudi and Delta have made it possible to get a lakehouse up and running quic […]

Technical debt and how to avoid it

Technical debt is when we use shortcuts during development, that are quick and easy, instead of opting for a more robust and generic solution. We promise to fix this later, i.e., paying the debt, yet forget about it. This results in growing list of TODOs/refactorings, a debt ridden code and an overall reduction in confidence on the platform.Why you accumulate technical debtThe prime cause of accumu […]

Processing personal data of users while respecting privacy protection laws

When working on projects that collect personal identifiable information of users (referred to as PII hereafter), it is a legal obligation to comply with the local privacy protection laws like GDPR / CCPA etc. However, this does not mean that whole teams and business units should be blocked from using this data without a valid reason to access PII.Here are a few techniqes that I have used or come ac […]

Writing tests for apache beam sliding window based streaming pipeline with late event triggers

I've seen engineers struggle to write functional test for streaming pipelines, especially when event ordering is important and complexities arise with late arriving data. In this post, I will explain how to control the watermarks and test a sliding window based aggregation logic, while also considering late events.If you are new to windowing in apache beam, I suggest you go through the concepts bef […]

Running the LLaMA AI Language Model on a Laptop

LLaMA is an open source large language model built by Meta. It is quite small in size compared to other similar models like GPT-3, thus with the potential to be run on everyday hardware, atleast for fun, like I did. It is impressive how complex AI models, like these, can be packaged into files of few gigabytes and can be launched anywhere.The trained model of LLaMA was only made available to resear […]

New posts >>