How open source is driving the big data market

Published on the 28/06/2017 | Written by Steve Singer

Open source driving data market

There is a clear split between legacy and next- generation approaches to software development, writes Steve Singer…

Legacy vendors in the big data space generally have internal development organisations, dedicated to building proprietary, bespoke software. It’s an approach that has worked well over the years – but it is being supplanted by open source approaches.

That’s because the big data market has always moved fast and it’s had an element of open source from the beginning; for example, a large proportion of what the major Hadoop vendors deliver is based on open source. These vendors, and those building complementary technology, are best placed to take advantage of new big data trends (like Spark Streaming) and build solutions that add value to customers.

Traditional legacy and proprietary approaches to data integration still have their place.  These vendors have solid products, reliable technology and well-funded development teams. However, their products are typically built on a traditional architecture which may not easily adapt to the big data environment.

These products may work effectively for businesses that are doing things the way they always have. For straightforward requirements around data integration, data quality and ETL, while retaining existing processes and approach, there may be little need to move away from proprietary vendors entrenched in your information architecture.

The difficulty comes when organisations want to launch new projects or drive business transformation or product refreshes. Such moments can bring concerns around cost and flexibility.

For example, looking for additional functionality around big data ingestion could see legacy vendor licensing approaches becoming an issue. Buying perpetual software means major upfront costs and once the decision is made, it can be hard to modify or partially cancel the license should business needs change (or if it doesn’t work).

In addition, traditional legacy architectures are often unwieldy, and it can be difficult for businesses to adapt to evolving big data projects or environments.

By contrast, a flexible, licence-based open source environment offers multiple benefits for businesses that want explore big data. Subscription models mean the ability to dip a toe in. And if the licensing is less, as it often is, there is the ability to try without major overhead.

There’s more to it than the cost argument.  Th collaborative, partnership approach to product development associated with open source means the ability to tap into the work of communities of people, potentially accelerating the pace of innovation.

If you think about the latest high-impact big data Apache projects, for example, there are multiple organisations and individuals focused on the development of each one as well as the creation of new projects.

Such are the benefits it delivers that open source is becoming a standard approach in the big data arena. It is helping to drive innovative new technologies like Apache Spark and subsequently Spark Streaming, as well as helping to fuel emerging projects like Apache Beam.

While it is easy for open source vendors to support such projects, it often takes a major ‘crowbarring effort’ for legacy vendors to do so. And, often by the time they do, the rest of the world has moved on.

Just as the cloud has moved from disruptive force into the mainstream, the same process is now happening to open source. That’s why growing numbers of businesses in the big data integration field are adopting an open source first approach.


Steve Singer is ANZ Country Manager, Talend.

Post a comment or question...

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Other Articles by Steve Singer

The future_data storage_Steve Singer

The future is now: Welcome to the data century

opinion-article |June 5, 2018 | Steve Singer

Economic empires have been built on data, but at what cost, asks Steve Singer, ANZ Country Manager at Talend…

Data access_Talend

Why efficient data access is vital in today’s business world

opinion-article |March 1, 2018 | Steve Singer

In an increasingly digital and interconnected world, data has become the oil that keeps business humming, writes Talend’s Steve Singer…

GDPR compliance_Talend

The challenge of GDPR compliance in the Internet of Things era

opinion-article |November 13, 2017 | Steve Singer

New regulations may handbrake IoT deployments, writes Talend’s Steve Singer…

Benefits AI_Talend

The future benefits of AI are bigger than you can imagine

opinion-article |August 30, 2017 | Steve Singer

It’s currently one of the hottest topics in the technology sector and, if experts are right Steve Singer says AI’s impact will be as big as the silicon chip…

How subscriptions spur technological development

opinion-article |March 15, 2017 | Steve Singer

Subscription business models are driving continuous innovation, writes Steve Singer…

Artificial Intelligence_Talend

The reality of the AI revolution

opinion-article |December 13, 2016 | Steve Singer

Artificial intelligence (AI) is one of the most evocative and confusing terms in technology. In making sense of it, Steve Singer says AI is an everyday reality with plenty more to come…

Thank you! Your subscription has been confirmed. You'll hear from us soon.
Follow iStart to keep up to date with the latest news and views...