inlumi’s customers are increasingly moving their infrastructure into the cloud. As such, we are actively helping our customers to plan their cloud migrations, facilitate communication between the business and IT departments and perform the actual migrations.
As the market starts adopting the latest trends, vendors are getting ready to shift their focus from cloud to machine learning. In this post, we discuss 15 years of market trends and buzzwords in the analytics space. The trends often have unintended side effects and can be costly when implemented incorrectly. Read on to see if you recognise any of these symptoms and pitfalls.
One Version of the Truth
I started working with analytics nearly fifteen years ago. The market trend at the time was data consolidation through data warehouses. The accompanying buzzword was "one version of the truth". Our customers were heavily focused on building large structured databases using large database projects.
In projects that I was involved in, the database took about two years to build, and the analytics team started building analyses on top of the database one year into the project. This had the advantage that there was already plenty of structured data to start working with, and the database was not yet finished, so the needs of the chosen analytical tools could easily be incorporated into the ETL process.
The natural evolution of this trend was that the databases grew very large. The database teams had to consider opposing market forces: add the new data sources demanded by business, or fulfil the performance SLAs demanded by business? Offloading data into smaller data marts solved some of the issues, but server and license costs still grew.
At this time a new trend appeared: relatively cheap, clustered, off-the-shelf servers running open source database software. The accompanying buzzword was "big data". Querying database clusters can be a bit tricky, and there are pitfalls that will ruin your performance. All in all, big data technology has evolved and matured greatly and is a good complement to the data warehouses.
Often the data warehouse department had licensing and expertise with a particular software stack. This could relegate the big data implementations to small understaffed and under-funded departments. After all, the software was supposed to be free, right? Except that the man-hours that go into building a data warehouse are not free.
Data lakes and data capital
Through big data clusters, companies now had the ability to store vast amounts of data at a fraction of the traditional cost. This gave rise to the “data lakes” where data was simply saved and made accessible to data scientists for analysis.
“Data capital” was the notion that companies should save all the data they can, because you can only mine insights out of data that you saved. The more data that you save, the more insights and advantage you have over your competition. Gone were the days of costly ETL projects.
There is merit to data lakes and to the notion of data capital. However, a data lake should be used as a tool for data scientists to give your organisation an edge over the competition. It is not a replacement for a structured data warehouse that is designed to reflect your business, its processes and your management KPIs.
The latest market trend is the one we are currently experiencing: the move to cloud. Market forces and economy of scale urge companies to move more and more of their IT infrastructure into the cloud.
Initially organisations have been hesitant to move their data into the cloud. There hasn't been any major technological advancement that has solved those initial fears. People have grown used to having their social lives, private conversations and pictures in the cloud, and slowly businesses are getting used to the notion of having their emails, data and tools in the cloud.
A migration to cloud infrastructure can be very beneficial and open up resources for organisations to focus on their core business.
A move to the cloud should of course happen for the right reasons. Managers who think that their IT department is slow are tempted to whip out the corporate credit card and take matters into their own hands. For small departmental needs, this approach could be fine.
However, the IT department usually handles things like governance, IT-security, auditability, documentation as well as application hosting. All these things are still highly relevant, and the processes are there for a reason even though they might slow down progress.
The evolution of the market trends should not be perceived as new technology replacing old technology. It should rather be perceived as tool-making where you build upon one set of tools to make new, different tools.
- First you organise your core data and organise your business processes and KPIs.
- Then you see if you can offload some of that data from relational databases into big data clusters. They should work side by side, each having individual strengths and weaknesses.
- Once big data is widely adopted in the organisation you can add data lakes to your toolkit. These do not replace the well-organised databases. They simply give data scientists a chance to innovate your business.
- Lastly, assess which parts of your data and your business tools you can safely move to the cloud. This step should be done in close collaboration with the IT department.
If you need help with analytics-related issues and cloud migrations, don’t hesitate to contact me or my inlumi colleagues near you.