Analytics and Public Cloud Platforms
Date: 27/09/2018

By Ahmed Eltoukhy, Senior Consultant at Cornerstone Performance Management

According to Gartner Forecasts the infrastructure-as-a-service (Iaas) market will reach USD 40.8 billion by the end of the year and is projected to reach USD 83.5 billion by 2021. Platform-as-a-service (PaaS) is expected to grow to USD 15 billion by the end of 2018 and reach USD 27.3 billion by 2021. Currently, Amazon Web Services dominates the public cloud market with Microsoft Azure and Google Cloud Platform following close market penetration and adoption rates. Google is reportedly planning on developing three underwater cables in 2019 to bring its Cloud services to new regions.

We receive many enquiries around public cloud offerings and recently I attended the Google Cloud Summit in Sydney. It was a great opportunity to discover new technologies and see what’s happening in the market.

There was a lot to take in. Individually, each technology and product warranted more research and reading, from Cloud Storage, Pub/Sub, Dataflow, Dataprep, Dataproc, Datalab, Data Studio, to ML APIs and AutoML. Frankly, it will require more than one blog post to capture all the ideas and concepts discussed.To kick things off, this post covers the overall Google Cloud offering in terms of Information Management and Prediction/AI.

Google Cloud offerings for the Data & Machine Learning Platform

Google provides a wide range of services and technologies on their Cloud to support the Data Platform, Analytics and Artificial Intelligence (AI) fields.

  • Data Ingestion
This is the first step for any Analytics and AI/Prediction model. Data Ingestion is simply the process of data extraction and acquisition into the model.

With Google Cloud, the Data Ingestion process is further categorised into two types:

  1. Message or Event generated data, refers to data generated by an event or an alert, such as website hits, connection initiation or opening a specific mobile app. In all these examples, data is not generated regularly or by any well-defined recurring business process, but by a trigger or an event that is unpredictable.

This kind of data is ingested into Google Cloud using a tool named Cloud Pub/Sub.

  1. Batch Data is any other data that isn’t event-triggered. That is, your regular transactional data, logs, files etc.

This kind of data is ingested by Google Cloud Storage.

  • Processing/Transformation

As the name suggests, Transformation is the process of massaging and preparing data to make it suitable for business decision-making.

  1. Dataflow is the data pipelining, processing and transformation service (ETL) capable of handling huge volumes and a wide variety of both real-time and historical/batch data.
  2. Dataproc is the service to configure, manage and run Big Data clusters (namely Apache Hadoop and Apache Spark) whilst easily integrating with other Google Cloud Platform (GCP) services.
  3. Dataprep is a visual service for data exploration, cleansing and preparation. Think of it as a light-weight Dataflow. Actually, a Dataprep job can be converted to a Dataflow job, but not the other way around.
  • Store & Analyse

Once processed, the data would be stored in an analytics-friendly format (most often  Data Warehouse) to allow for rapid data analysis and explorations

    1. BigQuery is the Google Cloud Data Warehouse technology supporting analytics and explorations based on large volumes of data with enhanced and efficient performance
    2. BigTable is the NoSQL (Wide Column Store model) Big Data highly-scalable database service used by many Google services including Search, Maps and Gmail

You can see that there is a relationship between data ingestion and storage processes. Furthermore, as Google has various data storage options, they had to construct a flow diagram to help users choose the most suitable storage service based on their specific business requirements.

  • Visualise

The next step upon data storage and analysis is to visualise the results, making it easier to grasp, understand and act upon.

  1. Data Studio is a free tool (for now!) to create simple light-weight visualisations based on a variety of data sources whilst allowing sharing the results with individuals or teams around the world. This is because Data Studio reports are saved on Google Drive so users can share it in the same way they can share any other Google Drive file.
  2. Datalab is an interactive tool to explore and visualise data by allowing the user to connect to various Google Cloud services (BigQuery, ML Engine, Google Compute Engine and Google Cloud Store) and to build machine learning and prediction models using TensorFlow. Put simply, it is the Notebook server on Google Cloud based on Jupyter.
  • Learn & Recommend

Everyone is talking about Artificial Intelligence (AI) and Machine Learning (ML). They’ve become a hot topic in the last few years. We all want to reach the stage where the computer can learn from our data and make reliable and trustworthy recommendations of what to do next.

Google Cloud has different services for Generic and Custom ML models, each with a different set of capabilities and complexity levels.

Criteria

Generic ML

Highly Custom ML

Pros

Available out of the box, ready-made and easy to use

Can be completely customised to your specific requirements of prediction and recommendation

Cons

Designed for generic use

Requires unique skill set and resources to implement and support, must be worth the effort and investment

Google Cloud Services

ML APIs (Vision, Video, Translate, Natural Language, Text to Speech, Speech to Text)

Tensorflow, Kubeflow, Spark, Beam, Caffe etc.


Moreover, Google has created a tool that allows users to achieve this using a wizard-based user-friendly interface called AutoML (still BETA version). Its aim is to enable developers with limited machine learning expertise and technical knowledge to train ML models specific to their needs, by using Google’s ML APIs.

Google is introducing a mid-level ML, allowing a degree of customisation to fit the growing need for AI and Prediction using existing generic ML APIs. For example, a generic Vision ML can be used to identify images with cats or dogs in them. Using the Vision ML API, an ML model can be trained to find out the age of a cat in an image.

Take a look at our analytics maturity curve to get a picture of how your business is progressing along the AI journey.