The Big Data: Trouble and Advantage for the Big firms

What is Big Data?

The amount of data that can’t be stored on a single storage device due to low volume and reading/ writing speeds (transfer speed) is known as big data. As the data gets bigger, the access time will increase in the storage device and volume problems will occur someday even if we use the highest volume of storage devices. So, here comes the concept of distributed storage.

Three main problems of Big Data:

  1. Volume
  2. Velocity
  3. Variety

How does the Distributed Storage work?

There is one Name node(master) which is connected to many data nodes(slaves) within a network. This network is a master-slave topology. When some data is to be stored on this system/server, the data received is divided into blocks of data and stored on different data nodes which increase the velocity as the data is divided and stored on different data nodes at the same time. It also increases the storage capacity. If, we have data nodes of 10GB each. Then, collectively it makes 100GB with 10 data nodes and hence, increasing the overall storage capacity of the storage server. The whole setup is known as a cluster and formed a distributed storage.

HDFS (Hadoop distributed file system) is one of the protocols for distributed storage.

Some of the companies using Big Data:

  1. UPS

The company now tracks data on 16.3 million packages per day for 8.8 million customers, with an average of 39.5 million tracking requests from customers per day. The company stores over 16 petabytes of data.

It launched the program ORION(On-Road Integrated Optimization and Navigation), which collects data from 46,000 vehicles using telematics sensors. It helps driver’s reconfiguring pickups and drop-offs time relying heavily on online maps. The project has already led to savings in 2011 of more than 8.4 million gallons of fuel by cutting 85 million miles off of daily routes.

2. United Healthcare

United Healthcare has been focused on structured data analysis for data analysis and even advertises its analytical capabilities to the consumers.

Now, they are focusing analytical attention on unstructured data- in particular to the customer attitudes over recorded voice calls.

To analyze the text data, United Healthcare uses a variety of tools. The data initially goes into a “data lake” using Hadoop and NoSQL storage, so the data doesn’t have to be normalized. The natural language processing — primarily a “singular value decomposition”, or modified word count — takes place on a database appliance. A variety of other technologies are being surveyed and tested to assess their fit within the “future state architecture. United also makes use of interfaces between its statistical analysis tools and Hadoop.

3. Caesars Entertainment

The company has data about its customers from its Total Rewards loyalty program, web clickstreams, and real-time play in slot machines. It has traditionally used all those data sources to understand customers, but it has been difficult to integrate and act on them in real-time, while the customer is still playing at a slot machine or in the resort.

To pursue this objective, Caesars has acquired both Hadoop clusters and open-source and commercial analytics software. It has also added some data scientists to its analytics group.. Caesars is also beginning to analyze mobile data, and is experimenting with targeted real-time offers to mobile devices.

4. is heavily focused on customer-oriented analytical applications involving personalization, ad and email targeting, and search engine optimization. It’s growing at a 50% annual rate — faster than any other part of the business.

They include open-source tools like Hadoop, R, and Impala, as well as purchased software such as SAS, IBM DB2, Vertica, and Tableau. Analytical initiatives are increasingly a blend of traditional data management and analytics technologies, and emerging big data tools. The analytics group employs a combination of machine learning approaches and traditional hypothesis-based statistics.

5. Bank of America

With a very large amount of customer data across multiple channels and relationships, the bank historically was unable to analyze all of its customers at once and relied on systematic samples. With big data technology, it can increasingly process and analyze data from its full customer set.

The primary focus of the bank’s big data efforts is on understanding the customer across all channels and interactions and presenting consistent, appealing offers to well-defined customer segments. For example, the Bank utilizes transaction and propensity models to determine which of its primary relationship customers may have a credit card or a mortgage loan that could benefit from refinancing at a competitor.

A new program of “BankAmeriDeals,” which provides cash-back offers to holders of the bank’s credit and debit cards based on analyses of where they have made payments in the past. There is also an effort to understand the nature of and satisfaction from customer journeys across a variety of distribution channels, including online, call centre, and retail branch interactions.

6. Sears

“We’re investing in real-time data acquisition as it happens,” says Oliver Ratzesberger, Vice President of Information Analytics and Innovation at Sears Holdings. “No more ETL. Big data technologies make it easy to eliminate sources of latency that have built up over some time.” The company is now leveraging open source projects Apache Kafka and Storm to enable real-time processing. “Our goal is to be able to measure what’s just happened.”

7. Schneider National

What has changed in Schneider’s business over the past several years is the availability of lowcost sensors for its trucks, trailers and intermodal containers. The sensors monitor location, driving behaviours, fuel levels and whether a trailer/container is loaded or empty. Schneider has been transitioning to a new technology platform over the last five years, but leaders there don’t draw a bright line between big data and more traditional data types. However, the quality of the optimized decisions it makes with the sensor data — dispatching of trucks and containers, for example — is improving substantially, and the company’s use of prescriptive analytics is changing job roles and relationships. New sensors are constantly becoming available. For example, fuel-level sensors, which Schneider is beginning to implement, allow better fueling optimization, i.e., identifying the optimal location at which a driver should stop for fuel based on how much is left in the tank, the truck’s destination and fuel prices along the way. In the past, drivers have entered the data manually, but sensor data is both more accurate and free of bias.

8. Facebook

Facebook generates 4 petabytes of data per day — that’s a million gigabytes. All that data is stored in what is known as the Hive, which contains about 300 petabytes of data.

VP of Engineering Jay Parikh explained why this is so important to Facebook: “Big data is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.” By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.