Big Data and smart content: New challenges for content management applications


Guest post by Eric Barroca

Content is getting bigger--way bigger--and this is scary to many technologists. At the same time, it's also getting smarter, bringing more complexity and sophistication to applications, but also more options to information and enterprise architects. How will these changes impact content management technologies? It's difficult to predict exactly, but there is insight to be found and used to plan for the future.

Bigger by the minute

If there's one topic that keeps cropping up when it comes to content management, it's the continuous and seemingly unstoppable growth data and content. Accelerated growth, combined with new requirements and new sets of tools and technologies, is a direct consequence of enterprise software's move to the web. The sheer numbers, which are covered in most enterprise content management (ECM) analyst reports, also extend to all aspects of the information technology sector, prompting developers to create a new generation of software and technology (such as NoSQL databases) or distributed computing frameworks (such as Hadoop) in an effort to cope with this scalability phenomenon.

As a content management practitioner and information management professional, one might not be completely aware of things like "Big Data" and "NoSQL." However, these topics are generating much attention, not only in the developer community but also at a higher level, as IT decision makers begin to question their commitment to specific technology providers.

Content growth is everywhere, in every nook and cranny of information systems. From traditional data warehouses to new consolidated Big Data stores, every piece of our IT infrastructure must be ready for this continuing scale, as it impacts the entire IT industry, including the ECM technology landscape--and cannot be ignored. 

Smarter by the second

ECM technology is evolving towards a platform-based approach, enabling organizations to make their own content-centric and content-driven applications smarter. This is another phenomenon that is regularly discussed amongst prevalent analysts, innovative vendors and users on the ground. 

The time for "out-of-the-box" CMS applications has passed; now each project has the ability to build a solution that can meet specific needs and individual requirements for a smarter approach to the way content applications work versus their predecessors. 

One thing to note is that content and data, more often than not, come with embedded intelligence, whether through additional custom metadata, in-text information (decrypted by smarter algorithms and systems), or by leveraging attached media and binary files that leverage the ability to now build applications that utilize the power of structured or unstructured content.

This can be observed on many different levels across various domains. For instance, the arrival of what some have started to call Web 3.0, the semantic web and the related technology promotes intelligence out of raw content through advancements like semantic text analysis, automated relations and categorization, and sentiment analysis--effectively, giving meaning to data.

More traditional components of ECM, such as workflow, content lifecycle management and flexibility, demonstrate much of the same. Smart content architecture, along with intelligent, adaptive workflow and processed or deep integration with the core applications within information systems, are all making enterprise content-centric applications smarter and are refining the way intelligence is brought to content. 

In short, content is getting smarter on the inside as much as on the outside.

It's an evolution, not a revolution

For technologists, some preconceived and simplistic notions must be left behind if they are to proceed effectively with these developments. It's an exciting time to watch technologies, such as NoSQL databases and other systems that relate to Big Data evolve. As developers often love to hate legacy technology and like to be innovators who are reinventing things, many might say traditional relational database applications belong to the past. These folks may want to ban traditional SQL as well as other mature technology, and begin to lobby for the adoption of new technologies, such as NoSQL-based document storage, to build their applications from scratch. From an architecture point of view, this is not the best approach and is not the way content management technology should evolve.

For years, relational databases have been developed based on real business requirements, and the same is true for web application frameworks and content management systems. They have all implemented functionalities for specific use cases that are all still valid, but are simply evolving. In fact, such a disruptive phenomenon as Big Data or the new semantic technology on the scene are huge opportunities for enterprise content management solutions. They are bringing new solutions and possibilities in business intelligence, semantic text analysis, data warehousing and caching that require integration into existing content-centric applications, all without rewriting them.

As a result, Big Data and smart content will push more of enterprise content management towards technical features such as software interoperability, extensibility and integration capabilities. 

These developments will also demand a clean and adaptive architecture that is flexible enough to evolve as new standards arise, such as the Stanbol project, which bridges CMS and semantic technologies, as well as connectors, to back-end storage system like the NoSQL Document Stores or connectors with text-analysis solutions.

This underscores the advancements made in the development of modular and extensible platforms for content-centric applications. The traditional approach of large enterprise content management suites that rely on older software architecture will have a harder time leveraging these new and nimble opportunities.

In order to get the most value of smart content and refine the way we deal with Big Data, enterprise content management architects must incorporate a modern and well-designed content management platform upon which to build that not only looks at end-user features but stays true to the development side. Enterprise content management will not be reinvented. Rather, Big Data and smart content are evolutions, not revolutions, in the industry.

As Chief Executive Officer of Nuxeo, Eric Barroca brings an unparalleled passion and commitment for software. Barroca joined Nuxeo as the 5th employee, and has since worn almost every hat one could wear at a single company. Over the past 11 years, he has been dedicated to making a difference in the content management market by creating software for developers with an emphasis on quality, modularity and agility. You can follow him on Twitter @ebarroca.