written by KC Cheung on 01 Nov 2019
Data science is one of the most exciting emerging fields. As we will see throughout the course of this article it is increasingly becoming an important part of a company’s future. First, we will explain, in simple terms, how data science works and how it can be applied to unstructured data sets. We will also look at how data science is already is widely in use. From medical research to dynamic pricing and driverless cars, data science is increasingly leading the way.
What is Data Science?
Data science is a term given to the practice of analysing raw data to discover any hidden patterns. Various applications and tools such as machine learning and sophisticated algorithms are all used in this process. Unlike other forms of data analysis, it can be applied to both structured and unstructured data. Data science is a more in-depth, detailed way of analysing data than data analytics.
Put simply data analysts use processing history to explain data.
Data scientists employ exploratory analysis and sophisticated tools to uncover new insights and predict future events. This makes it useful for predictive causal analysis. These are models that predict the possibility of a certain event occurring in the future. It is also useful for prescriptive analytics, intelligent models capable of making their own decisions and learning within dynamic parameters.
Data science is a forward-focused approach to utilising data. By exploring current or past data with sophisticated tools and algorithms data science is able to identify patterns and make accurate predictions. Employed correctly it is capable of answering open-ended questions such as what, why or how certain events can occur.
Why do we need Data Science?
Until relatively recently data was structured and small in size. It was able to be analysed either manually or with the use of simple tools and algorithms. Today thanks to technological developments we are increasingly producing more and more data. This is often semi-structured, or completely unstructured.
For example, it is estimated that over 80% of enterprise data is unstructured. This is only going to increase. To help us make sense of this growing mass of unstructured data we need more complicated analytical tools and sophisticated algorithms. Data science is the process of using these advanced tools to make sense of large amounts of unstructured data. As the amount of unstructured data, we produce increases, data science only grows in importance.
The Data Science Life Cycle
The CRISP-DM model is one of the most commonly applied models when it comes to applying data mining. This sets out the process in six clear steps. A similar life cycle can be used to describe data science.
The Stages of the Data Science Life Cycle
The data science life cycle can be set out in the following way:
1. The acquisition of data
This is simply the process of acquiring data from internal and external sources.
2. The preparation of Data
Here the data is cleaned and shaped into a useable form. These first two steps are vital for data science to function correctly. A still emerging discipline, data scientists often benefit from working alongside people experienced in understanding data. Typically, once understanding has been achieved the project will begin to resemble an engineering exercise. Consequently, it follows a defined set of rules and exit criteria. This allows data scientists to make informed decisions, allowing for optimisation.
3. Modelling and Hypothesis
This stage is commonly used on statistical samples in data mining. However, in data science, it is applied, via machine learning, to all types of data. It is at this stage of the process that training sets and models are created. Validation or test sets are also produced now.
4. Evaluation and Interpretation of Data
Once modelling has taken place, data is constantly tested, reevaluated and reshaped. Eventually, a useable model is created.
Once a usable model has been created it is deployed. This is often initially done in a limited or trail form. If further improvements are necessary they are made now.
Once the model has been optimised it can be rolled out into larger operations. While the model is now being deployed its performance is still monitored and evaluated.
While models are now operational they are still constantly improved. The more data a model can work through the more it learns, becoming more refined and capable in the process.
What is Data Science used for?
As we have seen data science is a useful way of organising and reading large data sets. This is useful in a range of scenarios. From creating personalised marketing campaigns to powering dynamic pricing and product recommendation systems data science is constantly growing in importance. Additionally, data science is helping car manufacturers develop driverless cars and other forms of automated technology.
Data Science Uses in the Banking Industry
Data science can be used to develop insights. This allows banking operations to make informed, data-driven decisions. One of the most important applications of data science is in the field of credit risk modelling. This process analyses applicants, or customers, helping the bank to decide whether to approve their loan or mortgage application.
Data Science is the Power Behind Contract Intelligence
Major banking firm JPMorgan Chase has introduced COiN, or Contract Intelligence. COiN uses data science, algorithms and Natural Language Processing (NLP) to process customer loan applications. Processing around 12 thousand applications a year, COiN reduces the workload of JPMorgan staff by 360 thousand hours. Not only does this help to speed up applications it also reduces human error.
These intelligent applications also allow JPMorgan to analyse a range of potential factors. As well as financial information this also includes factors such as the age or education of the applicant. Analysing as much information as possible allows the algorithm to present a rounded, informed picture of the client. This detailed picture means that the bank is in a better position to make an informed decision on the credit risk of the applicant.
Data Science can Help to Build a Meaningful Relationship
Other banks are using this wealth of data to build a meaningful relationship with their customers. By using the data we generate to monitor our life, and milestones banks can build a better understanding of their clients, and their current needs. For example, if a client has just become a parent, the bank can offer children’s accounts or college savings accounts.
Srini Nallasivan is Chief Analytics Officer at U.S.Bank. Nallasivan explained that “Our goal is to build meaningful, trusted relationships with our customers, because the deeper these relationships get, the more involved in their financial lives we can be.” By using data in this way, banks and other organisations can help to make their client’s lives easier. This, in turn, helps to establish and maintain a healthy relationship.
Credit and Insurance Applications
There are potentially a number of different ways that an insurance company can fall victim to fraud. Data science can be used to power algorithms, creating predictive systems. These systems can identify claims that are likely to be fraudulent.
Information such as the claimant’s financial history, their recorded income, age, and status are all considered by the model when processing each claim. Using predictive models to filter out likely fraudulent claims, the administrative efficiency of the clerk can be increased. This can save a company, and the industry not only time but money.
One of the UK’s largest insurers Aviva is using data science, machine learning and AI to monitor and reduce insurance fraud. These applications are also helping the company to improve its insurance provision. The company continues to implement a number of applications such as its fraudulent behaviour identification tool. This allows Aviva to monitor insurance quotes in real-time. Tom Gardiner is the head of fraud at Aviva UK General Insurance.
He observed that data science has allowed the company to move “the fight against claims fraud from ‘detection’ to ‘prevention’, from the point of claim to the point of sale, and from post-sale to real-time.” This approach is allowing Aviva to detect 20% more fraudulent claims. The average fraudulent claim is worth over £14,000. Aviva’s implementation of data science has helped them to detect nearly £14 million worth of such a fraudulent claim. This saves the company money and helps to keep their premiums low.
Data Science is Affecting your Insurance Premiums
Thanks to regular reminders from your insurance provider and the increase in comparison sites it is easier than ever before to get an insurance quote. For many people, it is also becoming more affordable. This is thanks to the use of data science, AI and machine learning tools.
Chrise Ganje, the CEO and co-founder of AMPLYFI has commented that technology is “enabling benefits on a number of fronts, such as speeding up quotes and pricing policies, enacting faster claims settlements, better fraud detection, and better customer profiling, for example, identify safer drivers through telematics data.”
Large insurance providers such as Allianz are already making use of this application of data science. By applying the iGenius artificial intelligence platform Allianz agents are able to access real-time information. This includes business updates, client profiles and anything else that can help to make an informed decision. Agostine Ferrara is the chief operating officer of Allianz.
He compared the virtual advisor to an “extra colleague for our network of professional Agents, an always-available coworker able to process vast amounts of data in a very short time and effectively support our Agents in their day-to-day operations, tapping into the extraordinary potential of conversational artificial intelligence.”
By making data easily accessible agents are able to offer the best possible options and provide the correct level of cover at the optimum price.
A Holistic Approach to Insurance
This application and use of data science have led to the emergence of technology-driven insurance providers such as Sherpa. They aim to break down insurance into a more holistic form of cover. Sherpa uses a custom-built algorithm to analyse customer data and information.
This is then turned into a score, indicating how big a risk the consumer is to insure. By taking a holistic, personalised approach to insurance provision Sherpa aims to provide an affordable, complete form of insurance provision.
How Data Science Is Changing the Travel Industry
One of the most useful applications of data science is that it allows you to turn data into practical knowledge. The travel industry is a particularly data-rich industry. Data is produced on everything including customer price ranges, chosen destinations, additional services required and preferred travel modes. This makes it an industry prime for data science.
The Adoption of Dynamic Pricing Models
Dynamic pricing is not confined to the travel industry. However, it is here that it is having a major impact. Dynamic pricing sees a company, or provider, use data to accurately segment customers or consumers. Each customer group can then be offered a differently priced product. This offer is based on information produced by data science and is based on a range of factors.
One of the earliest adopters of this strategy is Airbnb. Airbnb uses data science and a sophisticated, dynamic algorithm focused on pricing. This algorithm takes into account a number of diverse categories. These include lead times, reviews of the property and amenities provided. It also includes a number of key factors such as time of year, day of the week and any special events scheduled to occur in the area. This algorithm can then be used by the property owner to automatically determine their costings per night. By applying sophisticated pricing algorithms Airbnb is able to maximize their potential bookings for all available dates.
Airlines and Hotels are Increasingly Adopting Dynamic Pricing Models
Revenue management software provider PROS is helping a number of airlines adopt dynamic pricing models. The company has noticed that this is a growing trend, with more companies seeking to adopt this use of data science. The director of product management at PROS is John McBride. He said that “Based on our backlog of projects, there will be a handful of large carriers that move toward dynamic pricing science.”
This adoption of data science allows airlines to offer a continuous pricing strategy. This means that prices are more flexible and can be altered based on the individual consumer. Airlines are also able to offer bundled pricing offers and unique fare structures based on the particular market and departure date.
Hotels are also making use of data science-driven dynamic pricing strategies. The El Cortez Hotel and Casino in Las Vegas is using these tools to successfully alter their pricing strategy. This application sees the hotel reducing the number of discounts and offers on dates where the hotel is likely to be sold out anyway.
It also helps the hotel to maintain a better parity rate between online customers, walk-in guests, and third party providers. This application of data science and dynamic pricing has seen the hotel increase its revenue by 30%. Occupancy rates have also improved by 4.5%
Predicting Potential Delays
Data science can also be used to help companies manage disruptions, or potential disruptions, more efficiently. Alongside intelligent tools and sophisticated algorithms, it can be trained to predict and monitor potential disruptions. These disruptions can be caused by a range of factors such as weather, staff strikes or mechanical failures.
The system can then alert staff to potential disruptions before they impact, allowing for contingency plans to be put into operation. By acting as swiftly as possible disruption to travel is kept to a minimum if not completely negated. Australian airline Qantas was one of the earliest adopters of data science applications in travel predictions. By using the Amadeus solution Qantas has been able to implement a Scheduled Recovers system. This is a recommendation engine that can identify potential disruptions.
Using data science in this way has reportedly helped Qantas to reduce the number of cancelled flighted to just 3.4%. This can be applied to large modes of travel, such as airlines or smaller, everyday transportation methods. In these applications, data science can optimise bus routes or help commuters avoid traffic jams.
By keeping people informed of potential disruptions in real-time allows the traveller to feel they have more control over their journey. This helps to keep customer dissatisfaction to a minimum. A Travel Debrief survey by PSFK reveals that 83% of industry experts believe that giving travellers real-time assistance and more control over their journey is increasingly important.
Marketing is Making the Most of Data Science
The information produced by data science is a marketer’s dream. Consequently, it is already being used to improve marketing strategies in a number of ways. These include content marketing, customer engagement, SEO and cross-selling. Marketing relies on getting the most from the information that you have. This makes it a prime area for data science to positively impact.
By using data science to gain insights into your customers’ companies can improve marketing strategies. Used effectively these applications can also help to improve sales and help businesses offer relevant products or services to their customer base. This makes or a better relationship, and possibly a more long term, profitable, relationship.
By segmenting the customer base, data science-driven algorithms help businesses to better target customers. This means you can devise marketing campaigns to solely target a certain age or gender group, or a shared previous purchase or average spend.
Targeting customers with products or services they are likely to engage with helps businesses to increase the customer’s lifetime value. Data science can also help to predict which customers are unlikely to further engage with your brand. By predicting the likelihood of engagement you are better able to target marketing campaigns, saving you time and money.
Segmentation can Help to Improve Brand Engagement
Stationary firm Paper Style has been using data science-driven algorithms to power its segmentation email marketing campaign. Here an algorithm divided consumers into two segments, brides and friends of the bride who are helping to plan the wedding. These two groups were sent targeted emails promoting wedding-related products and services. This approach increased Paper Styles’s email open rates by 255%.It also resulted in a 161% increase in people clicking through their website.
This approach is not confined to sales. Nuffield Health worked alongside the marketing agency House of Kaizen to develop a series of segmented landing pages. Each page was designed to appeal to a different segmented client group. Following the engagement, with the landing page, the consumer was sent personalised emails. This helped to increase Nuffield Health’s conversion rate from 1% to 8%.
Improving Sales Conversion
Using data science to segment a customer base we’re able to identify which groups are likely to spend the most amount of money and when they are more likely to engage. Once you have identified customers that are most likely to convert their engagement into a sale you can target them with a compelling offer. This means cross-selling or up-selling.
One of the Algorithms that makes these recommendations is the Apriori Algorithm. This algorithm drives the powerful recommendation engines used by Amazon, the online retail giant. Here, users, previous searches and purchases are logged. So the next time you log on you will receive a list of products you might be interesting.
Also when you are searching for a particular product you will receive a list of items that are commonly purchased alongside it. This application of data science encourages customer engagement and additional purchases. In turn, this helps to increase profits.
Data Science in Retail
From local stores to international corporations, data science is helping to improve business models and outcomes. Many companies are using this information to forecast demand. One of the most significant impacts of data science is in the area of pricing strategies.
From offering discounts to “happy hour” offers, such as those operated by Uber.
These can help to increase profits in relation to market conditions. By lowering prices when interest in products is low companies can try to maintain a steady income level. Then, just before demand is predicted to rise, prices can be increased. Data science can help companies to forecast product demand, allowing them to alter prices accordingly.
A report by Deloitte reveals that managing pricing strategies in this way can help a company to increase its profits by between 2-7% in a year. For Uber users, this means that you will have to pay more for a ride during peak rush hour times.
Another famous user of dynamic pricing and demand forecasting is Amazon. The retail giant, most famously during Black Friday, regularly adjusts product prices based on predicted customer demand. A survey of US consumers by Retail Systems Research revealed that amongst younger millennial consumers, 14% loved the practice of dynamic pricing.
Data Science in Social Media
Social media is an increasingly important part of people’s lives. It has been reported that the average American will spend 2 hours a day on various social networks. Using social media generates a mass of data that, thanks to data science, can be put to good use. This information reveals the devices, who you interact with and your interests. It also details the times and locations you are most likely to access social media from. Some data scientists have theorised that it is possible to approximate a person’s IQ by analysing their like pattern on Facebook.
Studying a person’s social media usage is a form of sentiment analysis. This can help a company build up an accurate portrait of you without ever directly interacting. OkCupid is an online dating site. Upon signing up for the site the user creates a profile by answering a series of seemingly unrelated questions.
Christian Rudder, a co-founder of the service revealed in a 2014 interview that the mass of data available to the company meant that they could make outlandish predictions. These include whether the customer’s parents had divorced before they reached the age of 21. With social media only growing in influence it seems that the mass of data we create, and what we inadvertently reveal about ourselves will only grow.
Creating Personalised Marketing
Data science allows companies to turn your use of social media, and the data it generates, into effective, personalised marketing campaigns. A University of Texas study revealed that this helps companies to reach customers by making them feel as if they have control over the information they see. It also helps to prevent information overload.
For example, streaming giant Netflix has over 4000 programs available to view on their platform. Their recommended for you section helps viewers to choose what to watch next by suggesting programs based on previous choices.
Retargeting is the practice of using targeted advertising on people who have interacted with your brand or website. For example, if you visit a clothing website once, you may then be sent messages or adverts showing similar products. Anyone who has ever browsed Amazon will have noticed that targeted adverts will then appear in the browser for the next few days.
Similarly, Facebook makes videos for users detailing their interactions and significant friendship moments with another user or users. While these videos, in this context, are only relevant to you and your friendship group, they can be expanded. For example, Cadbury used data science and algorithms to collect data, such as age or location, from their follower’s Facebook profiles. This information was then used to create videos filled with personal pictures, moments and matched users to a brand of Cadbury chocolate.
Data Science in Self Driving Vehicles
A major benefit of data science is that it allows us to quickly and accurately process large amounts of data. One area already making the most of this application in the field of automation. From blind spot detection to automatic braking systems and lane departure warnings data science is helping vehicles to read the road. The key to self-driving vehicles, such as Tesla is a complex system of sensors.
These continuously map everything that surrounds the car, such as curbs or lane markings. Along with radar, lidar and cameras they also identify moving hazards such as pedestrians, read road signs and traffic lights. Meanwhile, ultrasonic detectors accurately map the short-range surroundings of the car. Additionally, accelerometers, altimeters, GPS and other tools are fitted to the car. All of this produces a mass of data.
Data science has enabled the creation of software capable of reading quickly and accurately the mass of data a car produces. The more a car is used, and the more data is processed the better the system becomes at reading the road. This means that in time cars will become more capable of independently reading the world around it.
Self Flying Planes
While motoring and tech companies focus on self-driving cars, NASA is reportedly working towards the concept of a single pilot cockpit. This would see the first officer, traditionally the copilot, remaining on the ground where they would be able to monitor several flights at once.
Should an emergency arise the first officer would be able to control the flight of the plane from the ground. This may eventually see in air pilots becoming completely redundant. Just as data science is enabling automation of vehicles, it may also allow for a single pilot, or even autonomous, flights. While NASA may be interested in this application currently the major airlines are not seriously exploring the possibilities it could offer.
Drones and Data Science to Enable Inspection
Towers, power cables and phone masts can all be difficult, logistically to inspect regularly and safely. Additionally, inspections of infrastructure, especially if it spans a state or nation, can generate a large volume of data. Elia, the Belgian energy provider is using drones and data science to ease this tricky process.
They are using a data management and reporting system to ease processing and access to captured data. This allows the company to optimise its usefulness, improving its energy provision service. Elia is collaborating with Sky Futures, a drone inspection service provider.
This partnership sees the loading of collected data to Expanse, a cloud-based data science-driven asset management software platform.
Here it enables the mass of data recorded by Sky Futures to be processed. Defects or anomalies, such as corrosion, can then be highlighted. This allows Elia to focus on repair and maintenance efforts in the relevant area. Not only does this make inspections more efficient and quicker but it also allows Elia to provide a reliable, high-quality service.
Data Science is Improving Healthcare Provision
Healthcare is a data-heavy service. A survey by the Ponemon Institute revealed that this sector stores 30% of global data. This includes data produced by clinical trials, medical records, genetic information, scientific articles, research, and costs. This is only going to increase. Around 72% of people admit to looking up their symptoms or healthcare information online. Increasingly people are using smart tools such as Zocdoc to book appointments or communicate with healthcare providers.
Preventing and Predicting Disease
Omada Health is a digital therapeutics company. They use smart devices to create personalised behaviour and healthcare plans. These personalised plans are tailored to help each individual manage a chronic condition such as diabetes or high cholesterol as effectively as possible.
Meanwhile Awake Labs, a Canadian Startup is using wearable devices to generate and track data created by autistic patients. Focusing on younger patients these wearables are able to alert parents or caregivers before a meltdown occurs.
Improving Diagnosis Rates
Data science can help to improve diagnosis rates. Around 12 million Americans are misdiagnosed each year. Meanwhile, an article by the BBC estimated that diagnostic errors can cause between 40,000 and 80,000 deaths every year. Data science and smart tools can help to reduce these misdiagnosis rates. Researchers at Stanford University have developed a data-driven model capable of diagnosing irregular heart rhythms from ECGs.
These algorithms can make diagnoses far more quickly, and reliably than a cardiologist. Similarly, another model can determine whether a skin mark or lesion is benign or malignant. These applications take away the chance of human error, meaning that a diagnosis is more reliable. It also means that disease can be diagnosed more quickly, meaning that treatment can begin sooner.
Iquity is a predictive analytics healthcare platform. Using data science a pilot study analysed over 4 million data points generated by over 20 million New Yorkers. The pilot was focused on the testing of patients with multiple sclerosis and was combined with those misdiagnosed. Iquity was able to predict the onset of MS with a 90% accuracy. It was also able to detect signs of the disease 8 months before traditional tools such as spinal tapping revealed the symptoms.
Medication and Treatment Provision
Data science can help to produce useful information on a patient’s unique characteristics. This allows for a more personalised care plan to be derived. It also allows for a more precise medical prescription to be assigned. The 1000 Genome Project is using data science and sophisticated tools to progress the study of the human genome.
Similar open source initiatives are also helping to drive research into common ailments such as diabetes and heart disease. As scientists learn more about how these diseases progress and develop they are increasingly realising that a one size fits all treatment policy doesn’t work.
Similarly, researchers at Mount Sinai are using biomarkers and genomic data to segment types of bladder cancer. This allows them to identify forms of bladder cancer resistant to chemotherapy. Patients in these segments are then offered other treatment methods that may prove more effective. This application of data science is helping to improve and personalise healthcare provision.
Data Science is Helping to Create a More Personalised World
Data helps a business, researchers, developer or organisation understand their end goals. In business, this may be how to best meet customer’s needs while in a lab it may be how to quickly and reliably combat a disease. Data science allows users to manage huge streams of data, converting it into accessible and useful information.
As technology becomes more prevalent, the amount of data we produce will only increase. Consequently, data science is only going to grow in importance, helping us create a more personalised life experience.