DevCon5 Speaker: Bhagvan Kommadi

check out : Speaking at Dev Con 5 on Aug 1-4 NYU, USA : “Big Data Traps”


The “big data” projects often give CXOs short-lived confidence about information they are gathering. As a result, precious time and resources are wasted chasing wrong targets and missed business opportunities.No doubt, Big data provides the necessary insight to decision making, but without great analytics and understanding of enterprise structure, this will also end up as another wild chase.

To start with, Big data author Nate Silver points out that no matter how much data you collect, the objective truth that comes from the data pretty much stays the same. Yves Morieux calls the “shadow of the future”—in this case, that entering inaccurate data negatively affects the overall customer experience.

The classic Traps shared by CXO’s are:

You can’t manage what you don’t measure

Meaningless metrics (just because you can measure something doesn’t mean you should manage it.)

Ability to process huge amounts of data means success and win in business

  • Big data is a black box
  • Big data can predict the future
  • You are wrong, big data results are correct
  • chief executive data analyser effect
  • Data does not make decisions, people do
  • There’s no magical end point, but big data based decision making is  related to continuous improvement and iterative decision making
  • NO  Key performance indicators

Role of Metrics:

The person analyzing the data should be the person who best understands the context and source of the data. (sales team can analyze salesforce data).Differentiation in the market brings out success – not blind copying  the market activities and trends. Need to have knowledge what is happening in the field is important. Identifying key process areas (KPA) and for each KPA, identify key performance indicators. Analytics can take your decision making to a specific limit and distance on time.

eco-chamber effect : which is when an expected outcome is reinforced throughout the process.

Data Relationship

False Positives

there is the problem of “false positives.” If we look at 200 variables and the relationships between them, we have 40,000 possible relationships. That will inevitably mean a lot of correlations which are statistically significant but in fact random.The classic case here is the “ Ashes Stock Market Predictor.” It holds that if if england teamwins, the market will rise in that year. If Australia team wins, the market will fall.. But it was clearly a statistical fluke. This particular case  is too obvious to fool anyone, but other accidental correlations will be more subtle, and we will not waste a lot of time getting excited about equally spurious phenomena.

Organizations now using big data, and actively putting it to use, this relationship is in the middle of a reboot. Not only that, but the future of big data allows you to understand the larger trends in the markets in your industry, helping you better plan out the future of your organization.

Organizations will also use big data to better understand how to analyze and manipulate trends for their day-to-day business. For example, according to Forbes, Delhaize America is already using big data “to study the impact of local weather on store and category sales,” showing them that warmer weather increases the purchase of magazines, while decreasing the purchase of certain grilling meats.

Quality of Big data:

  • People, processes, and technology
  • Validity, the degree to which the data conforms to logical criteria
  • Completeness, the degree to which the data required to make decisions, calculations, or inferences is available
  • Consistency, the degree to which data is the same in its definition, business rules, format, and value at any point in time
  • Accuracy, the degree to which data reflects reality
  • Timeliness, the degree to which data reflects the latest available information

“zero-touch processing.”

It may not be possible or economical to fix all data-quality issues, such as those associated with external data, at the source. In such cases, companies could employ middleware that effectively translates “bad data” into “usable data.” As an example, often the structured data in an accounts-payable system does not include sufficient detail to understand the exact commodity being purchased. Is an invoice coded “computing” for a desktop or a laptop? Work-arounds include text analytics that read the invoice text, categorize the purchase, and turn the conversion into a rule or model. The approach can be good enough for the intended uses and much more cost effective than rebuilding an entire enterprise-software data structure.

Big data accuracy

  • Asking the right questions
  • Evaluating data quality
  • Comparing assumptions with reality
  • Understanding factors can skew results and backing predictions with statistics
  • Heeding privacy issues
  • Combining more than one data source
  • Communicating findings in a meaningful way

Big data often consists of “found” data, rather than data you purposely go out to collect. Relying on found data may not include variables that play a role in the results,

Big data framework

  •   Hypotheses about the decisions you’re making and the analysis you’re looking to perform in order to make them.
  • Descriptions of the data needed to feed the desired analysis, and their sources.
  • An understanding of the gaps – areas where you don’t currently have data – and how you plan to fill them and make decisions in the meantime.

Big data frameworks have evolved to support large data processing, parallelising the data tasks, data cleansing and storage of data. Reporting and analytics related capabilities have been transformed to decision making, forecasting and predicting tools.

Big data platform has discovery and prediction capabilities.  Discovery consists of Clustering, Outlier detection and affinity analysis. Clustering is detecting natural groupings. Outlier detection is detecting anomalies. Affinity analysis is identifying co-occurrence patterns. Classification, Regression, Recommendation is part of Prediction process. Classification is predicting categories. Regression is predicting value. Recommendation is predicting preferences.

Recommendations need to be backed up with sound judgement.

Businesses need to have a “revenue-driven” or “risk management-driven” business case for using big data.

Revenue Driven use Cases in Banking

Big data use cases in financial services are capital  market trading analysis, retail banking customer management, risk management, compliance, regulations, predictive analytics, customer retention, churn analysis, social graph analysis, marketing campaign analysis, fraud detection and network monitoring.

Consider the uses for data in terms of technical, organisational, and data stewardship feasibility. Also look at how the data use fits in with the company’s existing project portfolio.

Other Factors – To look into

   Bigger is Not Necessarily Better

Even though big data allows for the collection of masses of information, only a small percentage of that information is actually useful.

   Quantity Does Not Mean Quality

Having enormous piles of data where the data is supposed to represent “all” can end up skewing results. A case in point comes from social media, where every tweet may be collected and used to gauge overall public sentiment or mood. This method automatically fails the accuracy test as Twitter users do not represent the entire population. Many rarely tweet while others may have never even set up accounts.

   ‘Found’ Data is Not Always Truly Accurate Data

Big data often consists of “found” data, rather than data you purposely go out to collect. Relying on found data may not include variables that play a role in the results, such as credit rating agencies reporting firm facts on mortgages based only on data they collected at a time when the real estate market was soaring.

    Data Based on Behaviors Can be Misleading

Basing conclusions on past behaviors can be risky, particularly if you’re not sure what caused those behaviors. Here the errors don’t necessarily come from the data itself, but rather the interpretation of the data. This can be especially dangerous when people insist they “have the numbers to prove” whatever notion they’re touting.

Challenges for Enterprises

Enterprises face challenges based on their size, profile, dependency on its correspondents,provision for correspondent  services and capabilities. Performance and Breakdown of big data solutions are the bottleneck points. Enterprises have traditionally faced complexity in handling data sizes,scale, extent, speed,efficiency,complexity and different formats.

Emerging types of partnerships and vendors especially in banking and telecom are broad based programs driven by cross selling with telcos and retailers and banks working with nimbler, local and regional vendors.Customer demand for faster and efficient payments, entry of non-banks and convergence of channels are the emerging trends in banking business model.

The challenges faced in any vertical industry are bringing the channels together and handling big data from complex multi channel service environments.

Regulations, Privaycy, Ethics, Risk Management, Counter party risk management and Treasury technology compliant with corporate IT standards are the challenges for Enterprise to adopt Big Data Frameworks.

Best Practices

The best practices followed during big data analysis are evolution of a legacy big data environment, having sandbox and production box, backup and archiving, having multiple cache for increasing latency, master data management and data cleansing.

Enterprises need to centralize data into a single high-quality, on-demand source using a “one touch” master-data collection process. (MDM)

Enterprises need to have a pilot program in advanced analytics to act as an incubator for developing big-data capabilities in its business units and creating a path to additional growth.

Within a Business uint, Big data prototyping need to be on public cloud as it can be scaled instantly. After prototyping is done, big data solution is moved to private cloud. Boundary crash can be avoided by implementing far limits on scalability. Streaming data analytics are implemented for specific applicable cases.

Data world is modelled by dividing the data into dimensions and facts. Separate Data are integrated using separate data sources. Structured and unstructured data are integrated. Name valued pair data sets are stored in no sql data sources.

Big data governance consists of data quality, metadata management, master data management, privacy, security and compliance. IT need to work with management and support the cross-organisational cooperation. Private data need to be secured and shared data will be shared to third parties, vendors, institutions and other enterprises. Roles are identified within enterprise for data stewards, sponsors, program drivers and users.

Assign a business owner to data. Data must be owned to become high quality. Companies can’t outsource this step. Someone on the business side needs to own the data, set the pace of change, and have the support of the C suite and the board of directors to resolve complex issues.

Enterprise Adoption & Overcoming Challenges & Traps

Enterprises are enhancing their capabilities for establishing data gathering and assembly guidelines, guidelines for external data sharing, data security privacy, alignment of new product releases with customer preferences, expertise to solve big data analysis and performance data analysis.

Enterprises are focussing on big data initiatives towards tactical business objectives, product information management,  performance management, business execution correction, innovation through new products and predictive capabilities.

Big Data Platform

A Big data platform has operational services, data services  and  enterprise readiness services to provide High availability, Disaster Recovery, Security. Data Visualisation apps and Business Tools use Textual analysis, predictive analysis and statistical analysis services. Data Acquisition, Data Refinement data integration and data management is part of Data services layer. Semistructured Data analysis, Structured and Unstructured Data and Syndicated Data is part of Data Repository Layer.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s