Google Cloud & Vertex AI

Google is the birthplace of TensorFlow, Transformers, and TPUs. Its cloud platform offers the best access to cutting-edge hardware (TPU v5), tight integration with Google's AI research, and Gemini — one of the world's most capable frontier models.

📖 Covers: Vertex AI · Training Pipelines · Model Registry · Gemini API · TPUs · BigQuery ML · AutoML

GCP AI Service Overview

🧠
Vertex AI

Unified ML platform: notebooks, AutoML, custom training, pipelines, model deployment

💎
Gemini API

Access Gemini 1.5 Pro (1M token context), Gemini Flash (fast/cheap) via API

TPU v5

Google's custom AI chips — up to 4× faster than H100 for specific workloads

📊
BigQuery ML

Train and run ML models with SQL — no Python required

🤖
AutoML

Train production-quality models on your data without writing code

🔍
Vertex AI Search

Enterprise semantic search powered by Gemini and vector embeddings

Vertex AI: The ML Platform

Vertex AI unifies all Google Cloud ML services under one API and UI:

Workbench

Managed Jupyter notebooks with GPU/TPU support. Pre-installed with TF, PyTorch, JAX.

Pipelines

Kubeflow-based ML pipelines for reproducible training workflows.

Experiments

Track hyperparameters and metrics across training runs (similar to MLflow).

Model Garden

Deploy open-source models (Llama, Gemma, Mistral) with one click.

Endpoints

Scalable REST endpoints with traffic splitting and A/B testing.

Custom Training on Vertex AI

Python · Vertex AI Custom Training Job
from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

job = aiplatform.CustomTrainingJob(
    display_name="my-pytorch-training",
    script_path="train.py",
    container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-1:latest",
    requirements=["transformers", "datasets"],
    model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/pytorch-gpu.2-1:latest"
)

model = job.run(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_A100",
    accelerator_count=1,
    replica_count=1,
    args=["--epochs=20", "--learning-rate=0.001"]
)

Gemini API

Python · Gemini API with google-generativeai
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel('gemini-1.5-pro')

# Text + image multimodal
import PIL.Image
img = PIL.Image.open("chart.png")
response = model.generate_content(["Analyse this chart:", img])
print(response.text)

# Streaming
for chunk in model.generate_content("Explain LLMs", stream=True):
    print(chunk.text, end="")

TPUs — Google's Secret Weapon

TPUs (Tensor Processing Units) are Google's custom AI chips, designed specifically for matrix operations in neural networks. They use a systolic array architecture that's uniquely efficient for the transformer attention patterns.

ChipPeak BF16 TFLOPsMemory (HBM)Best For
NVIDIA H10098980 GB HBM3General purpose, PyTorch
Google TPU v5e393 per chip16 GB HBM2Large-scale training, JAX
Google TPU v5p Pod459,000 total95 GB / chipFrontier model training
💡 When to Use TPUs

TPUs excel at large-batch training with JAX/TensorFlow. They're more complex to use than GPUs but dramatically cheaper for the right workloads. Google uses TPU pods exclusively to train Gemini. For most users, GPUs are the practical choice.

BigQuery ML — SQL for Machine Learning

SQL · Train a Classifier in BigQuery
-- Train a logistic regression model directly in SQL
CREATE OR REPLACE MODEL `myproject.dataset.churn_model`
OPTIONS(
    model_type = 'logistic_reg',
    input_label_cols = ['churned'],
    data_split_method = 'auto_split'
) AS
SELECT
    days_since_last_purchase,
    total_orders,
    avg_order_value,
    customer_age,
    churned
FROM `myproject.dataset.customer_features`;

-- Evaluate
SELECT * FROM ML.EVALUATE(MODEL `myproject.dataset.churn_model`);

-- Predict
SELECT * FROM ML.PREDICT(MODEL `myproject.dataset.churn_model`,
    TABLE `myproject.dataset.new_customers`);

Frequently Asked Questions

How do I get free TPU access?

Google's TPU Research Cloud (TRC) program offers free TPU access to researchers. Apply at google.com/tpu/trc. Kaggle notebooks also provide free TPU time. For commercial use, TPUs are rented by the hour through Google Cloud.

Vertex AI vs SageMaker — which is better?

SageMaker has broader adoption (more Stack Overflow answers, tutorials). Vertex AI has tighter integration with Google's AI research and TPUs. If your team uses Google Workspace or BigQuery heavily, Vertex AI is the natural fit. For pure ML engineering, both are comparable.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.