DOWNLOAD PDF LINK
GCP has 200+ services :
This exam expects knowledge of 40+ Services
Exam tests your decision making abilities:
Which service do you choose in which situation?
This course is designed to help you make these choices
Our Goal : Help you start your cloud journey AND get certified
Challenging certification - Expects you to understand and REMEMBER a number of services
As time passes, humans forget things. How do you improve your chances of remembering things?
Active learning - think and take notes
Review the notes every once in a while
Challenge:
- Peak usage during holidays and weekends
- Less load during rest of the time
- Solution (before the Cloud):
- PEAK LOAD provisioning : Procure (Buy) infrastructure for peak load
- What would the infrastructure be doing during periods of low loads?
- Startup suddenly becomes popular
- How to handle the sudden increase in load?
- Solution (before the Cloud):
- Procure (Buy) infrastructure assuming they would be successful
- What if they are not successful?
High cost of procuring infrastructure
Needs ahead of time planning (Can you guess the future?) Low infrastructure utilization (PEAK LOAD provisioning)
Dedicated infrastructure maintenance team (Can a startup afford it?)
How about provisioning (renting) resources when you want them and releasing them back when you do not need them?
On-demand resource provisioning . Also called Elasticity.
Trade "capital expense" for "variable expense"
Benefit from massive economies of scale
Stop guessing capacity
Stop spending money running and maintaining data centers
"Go global" in minutes
GCP is the one of the Top 3 cloud service providers Provides a number of services (200+)
Reliable, secure and highly-performant:
Infrastructure that powers 8 services with over 1 Billion Users: Gmail, Google Search, YouTube etc
One thing I love : "cleanest cloud"
Net carbon-neutral cloud (electricity used matched 100% with renewable energy)
The entire course is all about GCP. You will learn it as we go further.
Cloud applications make use of multiple GCP services
There is no single path to learn these services independently HOWEVER, we've worked out a simple path!
Create GCP Account
Regions and Zones
Imagine that your application is deployed in a data center in London What would be the challenges?
Challenge 1 : Slow access for users from other parts of the world (high latency)
Challenge 2 : What if the data center crashes?
Your application goes down (low availability)
Let's add in one more data center in London What would be the challenges?
Challenge 1 : Slow access for users from other parts of the world
Challenge 2 (SOLVED) : What if one data center crashes?
Your application is still available from the other data center
Challenge 3 : What if entire region of London is unavailable?
Your application goes down
Let's add a new region : Mumbai What would be the challenges?
Challenge 1 (PARTLY SOLVED) : Slow access for users from other parts of the world
You can solve this by adding deployments for your applications in other regions
Challenge 2 (SOLVED) : What if one data center crashes?
Your application is still live from the other data centers
Challenge 3 (SOLVED) : What if entire region of London is unavailable?
Your application is served from Mumbai
Imagine setting up data centers in different regions around the world
Would that be easy?
Solution
- Google provides 20+ regions around the world
- Expanding every year
- Region :Specific geographical location to host your resources
- Advantages: High Availability
- Low Latency
- Global Footprint
- Adhere to government regulations
How to achieve high availability in the same region (or geographic location)?
Enter Zones
Each Region has three or more zones
(Advantage) Increased availability and fault tolerance within same region
(Remember) Each Zone has one or more discrete clusters
Cluster : distinct physical infrastructure that is housed in a data center
(Remember) Zones in a region are connected through low-latency links
Compute
In corporate data centers, applications are deployed to physical servers
Where do you deploy applications in the cloud?
Rent virtual servers
Virtual Machines - Virtual servers in GCP
Google Compute Engine (GCE) - Provision & Manage Virtual Machines
Create and manage lifecycle of Virtual Machine (VM) instances Load balancing and auto scaling for multiple VM instances Attach storage (& network storage) to your VM instances
Manage network connectivity and configuration for your VM instances
Our Goal:
- Setup VM instances as HTTP (Web) Server
- Distribute load with Load Balancers
Let's create a few VM instances and play with them Let's check out the lifecycle of VM instances
Let's use SSH to connect to VM instances
Commands:
- sudo su - execute commands as a root user
- apt update - Update package index - pull the latest changes from the APT repositories
- apt -y install apache2 - Install apache 2 web server
- sudo service apache2 start - Start apache 2 web server
- echo "Hello World" > /var/www/html/index.html - Write to index.html
- $(hostname) - Get host name
- $(hostname -I) - Get host internal IP address
IP Address Description
Internal IP Address Permanent Internal IP Address that does not change during the lifetime of an instance
Ephemeral External IP Address that changes when an instance is stopped
Static IP Address Permanent External IP Address that can be attached to a VM
How do we reduce the number of steps in creating an VM instance and setting up a HTTP Server?
Let's explore a few options:
- Startup script
- Instance Template Custom Image
Bootstrapping: Install OS patches or software when an VM instance is launched.
In VM, you can configure Startup script to bootstrap
DEMO - Using Startup script
Why do you need to specify all the VM instance details (Image, instance type etc) every time you launch an instance?
How about creating a Instance template?
Define machine type, image, labels, startup script and other properties
Used to create VM instances and managed instance groups
Provides a convenient way to create similar instances
CANNOT be updated
To make a change, copy an existing template and modify it
(Optional) Image family can be specified (example - debian-9):
Latest non-deprecated version of the family is used
DEMO - Launch VM instances using Instance templates
Installing OS patches and software at launch of VM instances
increases boot up time
How about creating a custom image with OS patches and software pre-installed?
Can be created from an instance, a persistent disk, a snapshot, another
image, or a file in Cloud Storage
Can be shared across projects
(Recommendation) Deprecate old images (& specify replacement image)
(Recommendation) Hardening an Image - Customize images to your corporate security standards
Prefer using Custom Image to Startup script
DEMO : Create a Custom Image and using it in an Instance Template
Automatic discounts for running VM instances for significant portion of the billing month
Example: If you use N1, N2 machine types for more
than 25% of a month, you get a 20% to 50% discount on every incremental minute.
Discount increases with usage (graph) No action required on your part!
Applicable for instances created by Google Kubernetes Engine and Compute Engine RESTRICTION: Does NOT apply on certain
machine types (example: E2 and A2)
RESTRICTION: Does NOT apply to VMs created by App Engine flexible and Dataflow
For workloads with predictable resource needs
Commit for 1 year or 3 years
Up to 70% discount based on machine type and GPUs
Applicable for instances created by Google Kubernetes Engine and
Compute Engine
(Remember) You CANNOT cancel commitments
Reach out to Cloud Billing Support if you made a mistake while purchasing commitments
Short-lived cheaper (upto 80%) compute instances
Can be stopped by GCP any time (preempted) within 24 hours
Instances get 30 second warning (to save anything they want to save)
Use Preempt VM's if:
- Your applications are fault tolerant
- You are very cost sensitive
- Your workload is NOT immediate
- Example: Non immediate batch processing jobs
RESTRICTIONS:
- NOT always available
- NO SLA and CANNOT be migrated to regular VMs NO Automatic Restarts
- Free Tier credits not applicable
Shared Tenancy (Default)
Single host machine can have instances from multiple customers
Sole-tenant Nodes: Virtualized instances on hardware dedicated to one customer
Use cases:
- Security and compliance requirements: You want your VMs to be physically separated from those in other projects
- High performance requirements: Group your VMs together
- Licensing requirements: Using per-core or per-processor "Bring your own licenses"
What do you do when predefined VM options are NOT appropriate
for your workload?
Create a machine type customized to your needs (a Custom Machine Type)
Custom Machine Type: Adjust vCPUs, memory and GPUs
Choose between E2, N2, or N1 machine types
Supports a wide variety of Operating Systems: CentOS, CoreOS, Debian, Red Hat, Ubuntu, Windows etc
Billed per vCPUs, memory provisioned to each instance
Example Hourly Price: $0.033174 / vCPU + $0.004446 / GB
2 primary costs in running VMs using GCE:
- Infrastructure cost to run your VMs
- Licensing cost for your OS (ONLY for Premium Images)
Premium Image Examples: Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Ubuntu Pro, Windows Server, ..
Options For Licensing:
- You can use Pay-as-you-go model (PAYG) OR
- (WITHIN A LOT OF CONSTRAINTS) You can use your existing license/subscription (Bring your own subscription/license - BYOS/BYOL)
(RECOMMENDED) If you have existing license for a premium image, use it while your license is valid
After that you can shift to Pay-as-you-go model (PAYG)
Image
- What operating system and what sohware do you want on the VM instance? Reduce boot time and improve security by creating custom hardened Images.
- You can share an Image with other projects
- Machine Types
- Optimized combination of compute(CPU, GPU), memory, disk (storage) and networking for specific workloads.
- You can create your own Custom Machine Types when existing ones don't fit your needs
- Static IP Addresses: Get a constant IP addresses for VM instances
- Instance Templates: Pre-configured templates simplifying the creation of VM instances
- Sustained use discounts: Automatic discounts for running VM instances for significant portion of the billing month
- Committed use discounts: 1 year or 3 year reservations for workloads with
- predictable resource needs
- Preemptible VM: Short-lived cheaper (upto 80%) compute instances for non- time-critical fault-tolerant workloads
- How do you create a group of VM instances?
- Instance Group - Group of VM instances managed as a single entity
- Manage group of similar VMs having similar lifecycle as ONE UNIT
- Two Types of Instance Groups:
- Managed : Identical VMs created using a template:
- Features: Auto scaling, auto healing and managed releases
- Unmanaged : Different configuration for VMs in same group:
- Does NOT offer auto scaling, auto healing & other services
- NOT Recommended unless you need different kinds of VMs
- Location can be Zonal or Regional
- Regional gives you higher availability (RECOMMENDED)
Managed Instance Group - Identical VMs created using an instance template
Important Features:
- Maintain certain number of instances
- If an instance crashes, MIG launches another instance
- Detect application failures using health checks (Self Healing) Increase and decrease instances based on load (Auto Scaling) Add Load Balancer to distribute load
- Create instances in multiple zones (regional MIGs)
- Regional MIGs provide higher availability compared to zonal MIGs
- Release new application versions without downtime
- Rolling updates: Release new version step by step (gradually). Update a percentage of instances to the new version at a time.
- Canary Deployment: Test new version with a group of instances before releasing it across all instances.
Instance template is mandatory :
- Configure auto-scaling to automatically adjust number of instances based on load:
- Minimum number of instances
- Maximum number of instances
- Autoscaling metrics: CPU Utilization target or Load Balancer Utilization target or Any other metric from Stack Driver
- Cool-down period: How long to wait before looking at auto scaling metrics again?
- Scale In Controls: Prevent a sudden drop in no of VM instances
- Example: Don't scale in by more than 10% or 3 instances in 5 minutes
- Autohealing: Configure a Health check with Initial delay (How long should you wait for your app to initialize before running a health check?)
Distribute traffic across VM instances in one or more regions
Managed service:
Google Cloud ensures that it is highly available
Auto scales to handle huge loads
Load Balancers can be public or private
Types:
External HTTP(S)
Internal HTTP(S) SSL Proxy
TCP Proxy
External Network TCP/UDP Internal TCP/UDP
Managed Services
Do you want to continue running applications in the cloud, the same way you run them in your data center?
OR are there OTHER approaches?
You should understand some terminology used with cloud services:
IaaS (Infrastructure as a Service)
PaaS (Platform as a Service) FaaS (Function as a Service) CaaS (Container as a Service) Serverless
Let's get on a quick journey to understand these!
Use only infrastructure from cloud provider
Example: Using VM to deploy your applications or databases
You are responsible for:
- Application Code and Runtime
- Configuring load balancing Auto scaling
- OS upgrades and patches Availability
- etc.. ( and a lot of things!)
Use a platform provided by cloud
Cloud provider is responsible for:
- OS (incl. upgrades and patches)
- Application Runtime
- Auto scaling, Availability & Load balancing etc..
- You are responsible for:
- Configuration (of Application and Services)
- Application code (if needed)
Varieties:
- CAAS (Container as a Service): Containers instead of Apps
- FAAS (Function as a Service): Functions instead of Apps
- Databases - Relational & NoSQL (Amazon RDS, Google Cloud SQL, Azure SQL Database etc), Queues, AI, ML, Operations etc!
Enterprises are heading towards microservices architectures
Build small focused microservices
Flexibility to innovate and build applications in different programming languages (Go, Java, Python, JavaScript, etc)
BUT deployments become complex!
How can we have one way of deploying Go, Java, Python or JavaScript .. microservices?
Enter containers!
Create Docker images for each microservice Docker image has all needs of a microservice:
Application Runtime (JDK or Python or NodeJS)
Application code and Dependencies
Runs the same way on any infrastructure:
Your local machine
Corporate data center Cloud
Advantages
Docker containers are light weight
Compared to Virtual Machines as they do not have a Guest OS
Docker provides isolation for containers Docker is cloud neutral
Requirement : I want 10 instances of Microservice A container, 15 instances of Microservice B container and ....
Typical Features:
- Auto Scaling - Scale containers based on demand
- Service Discovery - Help microservices find one another
- Load Balancer - Distribute load among multiple instances of a microservice
- Self Healing - Do health checks and replace failing instances
- Zero Downtime Deployments - Release new versions without downtime
What do we think about when we develop an application?
Where to deploy? What kind of server? What OS?
How do we take care of scaling and availability of the application?
What if you don't need to worry about servers and focus on your code?
Enter Serverless
Remember: Serverless does NOT mean "No Servers"
Serverless for me:
You don't worry about infrastructure (ZERO visibility into infrastructure)
Flexible scaling and automated high availability
Most Important: Pay for use
Ideally ZERO REQUESTS => ZERO COST
You focus on code and the cloud managed service takes care of all that is needed to scale your code to serve millions of requests!
And you pay for requests and NOT servers!
Centrally hosted sohware (mostly on the cloud)
Offered on a subscription basis (pay-as-you-go)
Examples:
Email, calendaring & office tools (such as Outlook 365, Microsoft Office 365, Gmail, Google Docs)
Cloud provider is responsible for:
- OS (incl. upgrades and patches)
- Application Runtime
- Auto scaling, Availability & Load balancing etc.. Application code and/or
- Application Configuration (How much memory? How many instances? ..)
- Customer is responsible for:
- Configuring the software!
- And the content (example: docs, sheets etc)
Security in cloud is a Shared Responsibility:
Between GCP and the Customer
GCP provides features to make security easy:
- Encryption at rest by default
- IAM
- KMS etc
Customer responsibilities vary with the model:
- SaaS: Content + Access Policies + Usage
- PaaS: SaaS + Deployment + Web Application Security
- IaaS: PaaS + Operations + Network Security + Guest OS
Google Cloud is always responsible for Hardware, Network, Audit Logging etc.
Platform using open and familiar languages and tools
Cloud Functions Build event driven applications using simple, single- purpose functions
Cloud Run Develop and deploy highly scalable containerized
applications.
Does NOT need a cluster!
Managed Compute Service in GCP
- Simplest way to deploy and scale your applications in GCP
- Provides end-to-end application management
Supports:
- Go, Java, .NET, Node.js, PHP, Python, Ruby using pre-configured runtimes
- Use custom run-time and write code in any language
- Connect to variety of Google Cloud storage products (Cloud SQL etc)
- No usage charges - Pay for resources provisioned
Features:
- Automatic load balancing & Auto scaling
- Managed platform updates & Application health monitoring Application versioning
- Traffic splitting
Compute Engine is IAAS
MORE Flexibility MORE Responsibility
Choosing Image
Installing Software Choosing Hardware
Fine grained Access/Permissions (Certificates/Firewalls) Availability etc
App Engine is PaaS and Serverless
LESSER Responsibility LOWER Flexibility
Standard: Applications run in language specific sandboxes
- V1: Java, Python, PHP, Go (OLD Versions)
- V2: Java, Python, PHP, Node.js, Ruby, Go (NEWER Versions) Complete isolation from OS/Disk
- Supports scale down to Zero instances
Flexible - Application instances run within Docker containers
- Makes use of Compute Engine virtual machines
- Support ANY runtime (with built-in support for Python, Java, Node.js, Go, Ruby, PHP, or .NET)
- CANNOT scale down to Zero instances
Managed Kubernetes service
- Minimize operations with auto-repair (repair failed nodes) and auto-upgrade (use latest version of K8S always) features
- Provides Pod and Cluster Autoscaling
- Enable Cloud Logging and Cloud Monitoring with simple configuration
- Uses Container-Optimized OS, a hardened OS built by Google Provides support for Persistent disks and Local SSD
Let's Have Some Fun: Let's get on a journey with Kubernetes:
Let's create a cluster, deploy a microservice and play with it in 13 steps!
- Create a Kubernetes cluster with the default node pool .Gcloud container clusters create or use cloud console.
- Login to Cloud Shell
- Connect to the Kubernetes Cluster .Gcloud container clusters get-credentials my-cluster --zone us-central1-a --project solid-course-258105
- Deploy Microservice to Kubernetes
Create deployment & service using kubectl commands :
- kubectl create deployment hello-world-rest-api --image=in28min/hello-world-rest-api:0.0.1.RELEASE
- kubectl expose deployment hello-world-rest-api --type=LoadBalancer --port=8080
Increase number of instances of your microservice:
- kubectl scale deployment hello-world-rest-api --replicas=2
Increase number of nodes in your Kubernetes cluster:
- gcloud container clusters resize my-cluster --node-pool my-node-pool --num-nodes 5
You are NOT happy about manually increasing number of instances and nodes!
Setup auto scaling for your microservice:
- kubectl autoscale deployment hello-world-rest-api --max=10 --cpu-percent=70
Also called horizontal pod autoscaling - HPA - kubectl get hpa
Setup auto scaling for your Kubernetes Cluster
- gcloud container clusters update cluster-name --enable-autoscaling --min-nodes=1 -- max-nodes=10
Delete the Microservice
- Delete service - kubectl delete service
- Delete deployment - kubectl delete deployment
Delete the Cluster
- gcloud container clusters delete
Cloud Functions
Imagine you want to execute some code when an event happens?
- A file is uploaded in Cloud Storage
- An error log is written to Cloud Logging A message arrives to Cloud Pub/Sub
- Enter Cloud Functions
- Run code in response to events
- Write your business logic in Node.js, Python, Go, Java, .NET, and Ruby
- Don't worry about servers or scaling or availability (only worry about your code)
- Pay only for what you use
- Number of invocations
- Compute Time of the invocations Amount of memory and CPU provisioned
- Time Bound - Default 1 min and MAX 60 minutes(3600 seconds)
- Each execution runs in a separate instance
- No direct sharing between invocations
Cloud Run - "Container to Production in Seconds"
- Built on top of an open standard - Knative
- Fully managed serverless platform for containerized applications
- ZERO infrastructure management
- Pay-per-use (For used CPU, Memory, Requests and Networking)
- Fully integrated end-to-end developer experience:
- No limitations in languages, binaries and dependencies
- Easily portable because of container based architecture
- Cloud Code, Cloud Build, Cloud Monitoring & Cloud Logging Integrations
- Anthos - Run Kubernetes clusters anywhere
- Cloud, Multi Cloud and On-Premise
- Cloud Run for Anthos: Deploy your workloads to Anthos clusters running on-premises or on Google Cloud
- Leverage your existing Kubernetes investment to quickly run serverless workloads
How can you centrally manage multi-cloud and on-premise Kubernetes clusters ?
Anthos
Storage
What is the type of storage of your hard disk?
Block Storage
You've created a file share to share a set of files with your colleagues in a enterprise. What type of storage are you using?
File Storage
Use case: Harddisks attached to your computers
Typically, ONE Block Storage device can be connected to ONE virtual server
(EXCEPTIONS) You can attach read only block devices
with multiple virtual servers and certain cloud providers are exploring multi-writer disks as well!
HOWEVER, you can connect multiple different block storage devices to one virtual server Used as:
Direct-attached storage (DAS) - Similar to a hard disk
Storage Area Network (SAN) - High-speed network connecting a pool of storage devices
Used by Databases - Oracle and Microsoft SQL Server
Media workflows need huge shared storage for supporting processes like video editing
Enterprise users need a quick way to share files in a secure and organized way
These file shares are shared by several virtual servers
Block Storage:
Persistent Disks: Network Block Storage
Zonal: Data replicated in one zone
Regional: Data replicated in multiple zone
Local SSDs: Local Block Storage
File Storage:
Filestore:
- High performance file storage
- Most popular, very flexible & inexpensive storage service
- Serverless: Autoscaling and infinite scale
Store large objects using a key-value approach:
- Treats entire object as a unit (Partial updates not allowed)
- Recommended when you operate on entire object most of the time
Access Control at Object level
Also called Object Storage
- Provides REST API to access and modify objects
- Also provides CLI (gsutil) & Client Libraries (C++, C#, Java, Node.js, PHP, Python & Ruby)
- Store all file types - text, binary, backup & archives:
- Media files and archives, Application packages and logs
- Backups of your databases or storage devices
- Staging data during on-premise to cloud database migration
- Objects are stored in buckets
- Bucket names are globally unique
- Bucket names are used as part of object URLs => Can contain ONLY lower case letters, numbers, hyphens, underscores and periods.
- 3-63 characters max. Can't start with goog prefix or should not contain
- google (even misspelled)
- Unlimited objects in a bucket
- Each bucket is associated with a project
- Each object is identified by a unique key
- Key is unique in a bucket
- Max object size is 5 TB
- BUT you can store unlimited number of such objects
- Different kinds of data can be stored in Cloud Storage
- Media files and archives
- Application packages and logs
- Backups of your databases or storage devices Long term archives
- Huge variations in access patterns
- Can I pay a cheaper price for objects I access less frequently?
- Storage classes help to optimize your costs based on your access needs
- Designed for durability of 99.999999999%(11 9’s)
Storage duration
- storage region, 99.9% in regions
- High durability (99.999999999% annual durability) Low latency (first byte typically in tens of milliseconds) Unlimited storage
- Autoscaling (No configuration needed)
- NO minimum object size
- Same APIs across storage classes
- Committed SLA is 99.95% for multi region and 99.9% for single region for Standard, Nearline and Coldline storage classes
- No committed SLA for Archive storage
- Files are frequently accessed when they are created
- Generally usage reduces with time
- How do you save costs by moving files automatically between storage classes?
- Solution: Object Lifecycle Management
- Identify objects using conditions based on:
- Age, CreatedBefore, IsLive, MatchesStorageClass, NumberOfNewerVersions etc
- Set multiple conditions: all conditions must be satisfied for action to happen
- Two kinds of actions:
- SetStorageClass actions (change from one storage class to another)
- Deletion actions (delete objects)
- Allowed Transitions:
- (Standard or Multi-Regional or Regional) to (Nearline or Coldline or Archive)
- Nearline to (Coldline or Archive) Coldline to Archive
{
"lifecycle": {
"rule": [
{
"action": {"type": "Delete"}, "condition": {
"age": 30, "isLive": true
}
},
{
"action": {
"type": "SetStorageClass", "storageClass": "NEARLINE"
},
"condition": {
"age": 365,
"matchesStorageClass": ["STANDARD"]
}
}
]
}
}
Most popular data destination is Google Cloud Storage Options:
Online Transfer:
- Use gsutil or API to transfer data to Google Cloud Storage
- Good for one time transfers
Storage Transfer Service:
- Recommended for large-scale (petabytes) online data transfers from your private data centers, AWS, Azure, and Google Cloud
- You can set up a repeating schedule
- Supports incremental transfer (only transfer changed objects)
- Reliable and fault tolerant - continues from where it left off in case of errors
Storage Transfer Service vs gsutil:
gsutil is recommended only when you are transferring less than 1 TB from on-premises or another GCS bucket
Storage Transfer Service is recommended if either of the conditions is met:
- Transferring more than 1 TB from anywhere
- Transferring from another cloud
Transfer Appliance: Physical transfer using an appliance
Copy, ship and upload data to GCS
Recommended if your data size is
- greater than 20TB
- OR online transfer takes > 1 week
Process:
- Request an appliance
- Upload your data
- Ship the appliance back Google uploads the data
- Fast copy (upto 40Gbps)
- AES 256 encryption - Customer- managed encryption keys
- Order multiple devices(TA40, TA300) if need
Database Fundamentals
There are several categories of databases:
Relational (OLTP and OLAP), Document, Key Value, Graph, In Memory among others
Choosing type of database for your use case is not easy. A few factors:
Do you want a fixed schema?
Do you want flexibility in defining and changing your schema? (schemaless)
What level of transaction properties do you need? (atomicity and consistency) What kind of latency do you want? (seconds, milliseconds or microseconds)
How many transactions do you expect? (hundreds or thousands or millions of transactions per second)
How much data will be stored? (MBs or GBs or TBs or PBs) and a lot more...
This was the only option until a decade back!
Most popular (or unpopular) type of databases
Predefined schema with tables and relationships
Very strong transactional capabilities Used for
OLTP (Online Transaction Processing) use
cases and
OLAP (Online Analytics Processing) use cases
Applications where large number of users make large number of small transactions
small data reads, updates and deletes
Use cases:
Most traditional applications, ERP, CRM, e-commerce, banking applications
Popular databases:
MySQL, Oracle, SQL Server etc
Recommended Google Managed Services:
Cloud SQL : Supports PostgreSQL, MySQL, and SQL Server for regional relational databases (upto a few TBs)
Cloud Spanner: Unlimited scale (multiple PBs) and 99.999% availability for global applications with horizontal scaling
Applications allowing users to analyze petabytes of data
Examples : Reporting applications, Data ware houses, Business intelligence applications, Analytics systems
Sample application : Decide insurance premiums analyzing data from last hundred years
Data is consolidated from multiple (transactional) databases
Recommended GCP Managed Service
BigQuery: Petabyte-scale distributed data ware house
OLAP and OLTP use similar data structures
BUT very different approach in how data is stored
OLTP databases use row storage
Each table row is stored together
Efficient for processing small transactions
OLAP databases use columnar storage
Each table column is stored together
High compression - store petabytes of data efficiently
Distribute data - one table in multiple cluster nodes
Execute single query across multiple nodes - Complex queries can be executed efficiently
New approach (actually NOT so new!) to building your databases
NoSQL = not only SQL
Flexible schema
Structure data the way your application needs it
Let the schema evolve with time
Horizontally scale to petabytes of data with millions of TPS
NOT a 100% accurate generalization but a great starting point:
Typical NoSQL databases trade-off "Strong consistency and SQL features" to achieve "scalability and high-performance"
Google Managed Services:
- Cloud Firestore (Datastore)
- Cloud BigTable
Cloud Datastore - Managed serverless NoSQL document database
Provides ACID transactions, SQL-like queries, indexes
Designed for transactional mobile and web applications
Firestore (next version of Datastore) adds:
Strong consistency
Mobile and Web client libraries
Recommended for small to medium databases (0 to a few Terabytes)
Cloud BigTable - Managed, scalable NoSQL wide column database
NOT serverless (You need to create instances)
Recommend for data size > 10 Terabytes to several Petabytes Recommended for large analytical and operational workloads:
NOT recommended for transactional workloads (Does NOT support multi row transactions -
supports ONLY Single-row transactions)
Retrieving data from memory is much faster than retrieving data from disk
In-memory databases like Redis deliver microsecond latency by storing persistent data in memory
Recommended GCP Managed Service
Memory Store
Use cases : Caching, session management, gaming leader boards, geospatial applications
Databases/caches
A start up with quickly evolving schema (table structure) Cloud
Datastore/Firestore
Non relational db with less storage (10 GB) Cloud Datastore
Transactional global database with predefined schema needing to process million of transactions per second CloudSpanner
Transactional local database processing thousands of transactions per second Cloud SQL
Cache data (from database) for a web application : MemoryStore
Database for analytics processing of petabytes of data: BigQuery
Database for storing huge volumes stream data from IOT devices: BigTable
Database for storing huge streams of time series data : BigTable
IAM
You have resources in the cloud (examples - a virtual server, a database etc)
You have identities (human and non-human) that need to access those resources and perform actions
For example: launch (stop, start or terminate) a virtual server
How do you identify users in the cloud?
How do you configure resources they can access?
How can you configure what actions to allow?
In GCP: Identity and Access Management (Cloud IAM) provides this service
Authentication (is it the right user?) and Authorization (do they have the right access?) Identities can be
A GCP User (Google Account or Externally Authenticated User)
A Group of GCP Users
An Application running in GCP
An Application running in your data center Unauthenticated users
Provides very granular control
Limit a single user:
to perform single action
on a specific cloud resource from a specific IP address during a specific time window
I want to provide access to manage a specific cloud storage bucket to a colleague of mine:
Important Generic Concepts:
Member: My colleague
Resource: Specific cloud storage bucket
Action: Upload/Delete Objects
In Google Cloud IAM:
Roles: A set of permissions (to perform specific actions on specific resources)
Roles do NOT know about members. It is all about permissions!
How do you assign permissions to a member?
Policy: You assign (or bind) a role to a member
1: Choose a Role with right permissions (Ex: Storage Object Admin)
2: Create Policy binding member (your friend) with role (permissions) IAM in AWS is very different from GCP (Forget AWS IAM & Start FRESH!)
Example: Role in AWS is NOT the same as Role in GCP
Member : Who?
Roles : Permissions (What Actions? What Resources?)
Policy : Assign Permissions to Members
Map Roles (What?) , Members (Who?) and Conditions (Which Resources?, When?, From Where?)
Remember: Permissions are NOT directly assigned to Member
Permissions are represented by a Role
Member gets permissions through Role!
A Role can have multiple permissions
You can assign multiple roles to a Member
Roles are assigned to users through IAM Policy documents Represented by a policy object
Policy object has list of bindings
A binding, binds a role to list of members
Member type is identified by prefix:
Example: user, serviceaccount, group or domain
{
"bindings": [
{
"role": "roles/storage.objectAdmin", "members": [
"user:you@in28minutes.com", "serviceAccount:myAppName@appspot.gserviceaccount.com", "group:administrators@in28minutes.com", "domain:google.com"
]
},
{
"role": "roles/storage.objectViewer", "members": [
"user:you@in28minutes.com"
],
"condition": {
"title": "Limited time access", "description": "Only upto Feb 2022",
"expression": "request.time < timestamp('2022-02-01T00:00:00.000Z')",
}
}
]
}
Scenario: An Application on a VM needs access to cloud storage
You DONT want to use personal credentials to allow access
(RECOMMENDED) Use Service Accounts
Identified by an email address (Ex: id-compute@developer.gserviceaccount.com)
Does NOT have password
Has a private/public RSA key-pairs
Can't login via browsers or cookies
Service account types:
Default service account - Automatically created when some services are used
(NOT RECOMMENDED) Has Editor role by default
User Managed - User created
(RECOMMENDED) Provides fine grained access control
Google-managed service accounts - Created and managed by Google
Used by GCP to perform operations on user's behalf
In general, we DO NOT need to worry about them