Information Architecture is an enabler for Big Data Analytics. You may be asking why I would say this, or how does IA enable Big Data Analytics? We need to remember that Big Data includes all data (i.e., Unstructured, Semi-structured, and Structured). The primary characteristics of Big Data (Volume, Velocity, and Variety) are a challenge to your existing architecture and how you will effectively, efficiently and economically process data to achieve operational efficiencies.
In order to derive the maximum benefit from Big Data, organizations must be able to handle the rapid rate of delivery and extraction of huge volumes of data, with varying data types. This can then be integrated with the organization’s enterprise data and analyzed. Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content adding this source to your overall pool of Big Data. In addition, information architecture enables Big Data to rapidly explore and analyze any combination of structured, semi-structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between your data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.
Big Data Components
IA Element Volume Velocity Variety
Content Consumption
Provides an understanding of the universe of relevant content through performing a content audit. This contributes directly to volume of available content.
This directly contributes to the speed at which content is accessed by providing initial volume of the available content.
Identifies the initial variety of content that will be a part of the organization's Big Data resources.
Content Generation
Fill gaps identified in the content audit by Gather the requirements for content creation/ generation, which contributes to directly to increasing the amount of content that is available in the organization's Big Data resources.
This directly contributes to the speed at which content is accessed due to the fact that volumes are increasing.
Contributes to the creation of a variety of content (documents, spreadsheets, images, video, voice) to fill identified gaps.
Content Organization
Content Organization will provide business rules to identify relationships between content, create metadata schema to assign content characteristic to all content. This contributes to increasing the volume of data available and in some ways leveraging existing data to assign metadata values.
This directly contributes to improving the speed at which content is accessed by applying metadata, which in turn will give context to the content.
The Variety of Big Data will often times drive the relationships and organization between the various types of content.
Content Access
Content Access is about search and establishing the standard types of search (i.e., keyword, guided, and faceted). This will contribute to the volume of data, through establishing the parameters often times additional metadata fields and values to enhance search.
Contributes to the ability to access content and the speed and efficiency in which content is accessed.
Contributes to how the variety of content is access. The Variety of Big Data will often times drive the search parameters used to access the various type of content.
Content Governance
The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used. Content Governance will often "prune" the volume of content available in the organization's Big Data resources by only allowing access to pertinent/relevant content, while either deleting or archiving other content.
When the volume of content available in the organization's Big Data resources is trimmed through Content Governance it will improve velocity by making available a smaller more pertinent universe of content.
When the volume of content available in the organization's Big Data resources is trimmed through Content Governance the variety of content available may be affected as well.
Content Quality of Service
Content Quality of Service focuses on security, availability, scalability, usefulness of the content and improves the overall quality of the volume of content in the organization's Big Data resources by:
- defending content from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction
- eliminating or minimizing disruptions from planned system downtime
making sure that the content that is accessed is from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete
- enabling the content to behave the same no matter what application/tool implements it and flexible enough to be used from an enterprise level as well as a local level without changing its meaning, intent of use and/or function
- by tailoring the content to the specific audience and to ensure that the content serves a distinct purpose, helpful to its audience and is practical.
Content Quality of Service will eliminate or minimize delays and latency from your content and business processes by speeding to analyze and make decisions directing effecting the content's velocity.
Content Quality of Service will improve the overall quality of the variety of content in the organization's Big Data resources through aspects of security, availability, scalability, and usefulness of content.
The table above aligns key information architecture elements to the primary components of Big Data. This alignment will facilitate a consistent structure in order to effectively apply analytics to your pool of Big Data. The Information Architecture Elements include; Content Consumption, Content Generation, Content Organization, Content Access, Content Governance and Content Quality of Service. It is this framework that will align all of your data to enable business value to be gained from your Big Data resources.
Note: This table originally appeared in the book Knowledge Management in Practice (ISBN: 978-1-4665-6252-3) by Anthony J. Rhem.