A Moment of Clarity in Data Science

A Moment of Clarity in Data ScienceFeb2018

A few weeks ago, someone asked: “How will big data impact data warehousing?” Before answering this question, I feel compelled to clarify a few common misconceptions about big data and data science in general. We sometimes confuse Online Transaction Processing (OLTP) with Online Analytical Processing (OLAP); data warehouses with data lakes; big data with IoT; and machine learning with artificial intelligence. Although some of these tools overlap, they are not interchangeable.

OLTP is a relational database that tackles structured, real-time data such as order entries, financial transactions, retail sales, etc. OLAP augments your relational database and allows it to perform analyses and reporting on the stored data. Data warehouses enable you to extract insights from your data, but it is limited to internal business analysis or business intelligence. Data warehouses were commonly more expensive to maintain as processing volumes rose. Newer warehouses like Microsoft’s Azure SQL Data Warehouse have turned Big Data on its head, bringing in tools like Apache Hadoop which offers a cost-effective way to manage massive volumes of data that conventional database systems won’t.

Data lakes are an antecedent of warehouses, increasing the data capacity stored in repositories. Data lakes can take on structured, semi-structured, and unstructured data and are designed for low-cost storage of much higher volumes. Data lakes are also highly-agile and can be reconfigured as needed. There are resemblances in data warehouses and data lakes because both cling to a lot of data for analysis and reporting. The main difference between the warehouse and lake according to Gartner research director Nick Heudecker is that data lakes uncover new uses and opportunities in your data. As such, data lakes are not as simple to navigate for basic reporting as the data warehouse because of the “Schema-on-Read” approach. In this method, data sits in its raw, unstructured format until it is needed, so there is no tedious preparation before storing the data.

Now, back to the question “How Will Big Data Impact Data Warehousing”? Big Data is continuously flowing and growing, consisting of infinite amounts and types of data, mostly unstructured or human-generated (think social media or contact center). IoT data, on the other hand, is born from infinite connected sources such as smartphones and sensors or machine-generated sources. These proliferating data types and sources present a challenge for conventional data infrastructures. The changes in demand for data analysis fuel the need to implement technologies that can tackle advanced requirements. As such, leading companies are modernizing their systems with dedicated and robust Big Data- and IoT-ready applications. To that end, let’s look at the distinctions between, and opportunities within machine learning, its underlying algorithms, and how it fuels artificial intelligence. With the help of machine learning, natural language processing, and natural language understanding come to life.

According to Divan Dave, CEO of OmniMD, natural language processing is “When AI is trained to interpret human communication.” NLP is one of many machine learning applications that can read text, such as free text to detect customer sentiment analysis across channels. Leveraging tools such as Python and R Language, developers can write programs that can categorize and tag words, classify text, extract information from text, analyze sentence structure, and more. To take it further and interpret the meaning of sentence structure requires the application of natural language understanding.
NLU goes beyond understanding words and interprets meaning. NLU can understand meaning in spite of human errors like mispronunciations or transposed letters or words. Traditional uses for NLP have evolved with the use of deep learning. Deep-learning attempts to mimic the activity of the human brain where thinking happens. The software learns to recognize patterns in digital representations of sounds, images, and other data – think Google search and smartphone voice commands.

Hopefully, it’s more evident now that machine learning and artificial intelligence are not interchangeable. Instead, ML is a quintessential element that makes AI possible. So, if AI is the science of training machines to execute jobs that traditionally required the intelligence of humans to carry out, then, machine learning is a set of custom applications that can be applied to massive amounts and types of data for analysis to make smart decisions on what actions machines should take. Demands for AI technology—to elevate customer/user experiences via intelligent interactions—is escalating. A few months back our post on artificial intelligence mentioned some ways in which AI is saving lives in healthcare and improving operations on the plant floor in manufacturing. In my next post, I’ll look at how AI can help innovate processes in the financial services industry. Learn more about how we can help you  manage the surge of data for the best insights.

 

 

References:
http://searchbusinessanalytics.techtarget.com/definition/natural-language-processing-NLP
http://whatis.techtarget.com/definition/natural-language-understanding
https://www.technologyreview.com/s/513696/deep-learning/