GCP MLE

Minimum required datapoints to create a dataset in vertex AI 1000

[True/False] Feature definitions should change over time True

[True/False] Do not build monolithic models, make them small and simple True

[with/with not] Make an API for prediction for an ML model XX many inputs with not

[V., S., J., T., N., V I.] Googles pre-trained ML models vision API, speech API, jobs API, translation API, natural language API, video intelligence API

[better/worse] A simple ML model with lots of data to train is XX than a complex fancy model with little data to train with better

[B., S.] A ML pipeline should handle both XX and YY data. Not doing so is a common problem for model failures in practice batch, streaming

[D. C., B. I.] Steps in the ML workflow which usually takes the most time data collection, building infrastructure

Component of Vertex AI: Used to host data managed datasets

Component of Vertex AI: Used as a repository of features feature store

Component of Vertex AI: Used to have humans label your data data labeling

Component of Vertex AI: Used to host jupyter lab instances workbench

Component of Vertex AI: Automate, monitor, and govern ML systems pipelines

Component of Vertex AI: Can do both AutoML and custom training training

Component of Vertex AI: During experimentation, can be used as a black-box tool to tune hyperparameter for a model. Can use tensorpanel to compare vizier

Component of Vertex AI: Deploy a trained model. endpoints

[True/False] To host an model on Vertex AI endpoint, it has to have been trained on Vertex AI False

Component of Vertex AI: Hosts ML metadata and artifacts such as evaluation metrics ML metadata

For a deployed model enpoint, A/B testing can be conducted by tweaking the XX traffic split

According to Google, this type of model can be used for reinforcement learning, pattern recognition, self-driving cars, and cyber security GAN

[U., G.] On Vertex AI Notebooks, there are 2 types of notebook; XX managed and YY managed notebooks. user, google

Flow to train custom container models: XX -> YY -> ZZ -> Vertex Training dockerfile, cloud build, container registry

Store unstructed data such as image, video, and audio in XX and use YY cloud storage, data labeling

Best practice service to use for preprocessing tabular data bigquery

Best practice service to use for preprocessing unstructured data dataflow

Dataflow can convert data into binary data formats like TFRecord

Use the security blueprint in workbench notebooks to secure pii data

Human XX lead to XX in ML models since you choose the data to train with biases

Type of bias: You interact with the dataset which creates a bias in the data interaction bias

Type of bias: You have a class skewed dataset, e.g more men are firefighter => almost only men in the dataset latent bias

Type of bias: You fiddle with the dataset and remove specific data selection bias

Type of bias: Some of class goes unreported compare to how it is reporting bias

Type of bias: For human labeling when we label in how we percive the world confirmation bias

Biases can occur XX in the ML pipeline everywhere

[W. T., T.] Tools in GCP which can see fairness performance in GCP what-if tool, tensorboard

[Can, Can not] A confusion matrix XX be used to identify model biases can

Google confusion matrix; X-axis:, y-axis model predictions, true values

Confusion matrix component [abbreviation]: Predicted correct for the correct label TP

Confusion matrix component [abbreviation]: Predicted correct for the false label FP

Confusion matrix component [abbreviation]: Predicted false for the correct label FN

Confusion matrix component [abbreviation]: Predicted false for the false label TN

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

Visualize and gain insights such as classunbalance, summary statistics, and missing value for large datasets facets

GCP service: IaaS, raw compute, storage and network compute engine

GCP service: Containerized applications in a cluster Google kubernetes engine

GCP service: PaaS, bind code to libraries app engine

GCP service: Execute code in a response to events. Serverless cloud functions

Cloud storage class: For hot data used commonly standard

Cloud storage class: For data accessed one per month nearline

Cloud storage class: For data accessed every 90 days coldline

Cloud storage class: For data accessed once a year archive

Storage service: Unstructured data, blob storage cloud storage

Storage service: Structured data, transactional workloads, SQL, local scalable cloud SQL

Storage service: Structured data, transactional workloads, SQL, globally scalable cloud spanner

Storage service: Structured data, transactional workloads, NoSQL firestore

Storage service: Structured data, analytical workloads, NoSQL cloud bigtable

Ingestion and process service: No code, GUI solution datafusion

GCP Ai service: Pre-trained models to extract unstructured data to structured data document ai

Document AI component: Googles general model to analyse the data general

Document AI component: Googles specialized model to identify special data such as receipts, drivers licenses specialized

Document AI component: You create a model to extract data from an unstructed data source custom

GCP AI service: Ai powered contact center experience contact center ai

GCP Ai service: Rapidly generate healthcare insights and analytics with one end-to-end solution healthcare data engine

Dataflow can XX to meet a high demand in the pipeline autoscale

When building a datapipline, it is important to factor in if it is streaming and/or batch data, and what to do with XX data coming in. late

[J., P., G.] Languages with pipeline templates for Apache Beam/Dataflow java, python, go

[True/False] Dataflow is serverless and NoOps True

With BigQuery, you can pay as you go and use BQ flatrate

[True/False] In BigQuery, data is not encrypted at rest by default False

[C. M., F. M] Steps/commands required to create a BQML model create model, from ml.predict

General model class to choose: Supervised, classify data. E.g is email a spam? logistic regression

General model class to choose: Supervised, predict a number. E.g shoe sales next month? linear regression

General model calss to choose: Unsupervised, identify patterns and clusters. E.g grouping photos cluster analysis

BQML does automatically XX of categorical data and automatically YY the dataset into training and evaluation one-hot encoding, splits

[True/False] It is not mandatory to specify the model_type in BQML False

[B., P. A., A., C. T.] Ways to build ML models on GCP BQML, pre-built APIs, autoML, custom training

Compare to other GCP lowcode solutions, BQML only support this datatype tabular

Google ML solution: No training data pre-built APIs

Google ML solution: small to medium amount of data autoML

[B., C.] Google ML solution: Medium to large amount of data BQML, custom

[P. A., A.] Google ML solution: No options to choose hyperparameters pre-built APIs, autoML

Google ML solution: Medium options to choose hyperparameters BQML

Google ML solution: Lots of options to choose hyperparameters custom

Google ML solution: No time to train models pre-built APIs

[B., A.] Google ML solution: Medium time required to train BQML, autoML

Google ML solution: Long time required to train custom

Google ML solution: Familar with SQL and have data in BigQuery BQML

Google ML solution: Have little ML expertise pre-built APIs

Google ML solution: Want to build a custom model with own training data with minimal code autoMl

Google ML solution: Want to build a custom model with own training data and full controll custom

To get feature importance in BQML, one can inspect the XX e.g using SELECT * FROM ML.XX(MODEL `mydataset.mymodel`, (<query>)) weights

Pre-built APIs: Converts audio to text for data processing speech-to-text API

Pre-built APIs: Recognizes parts of speech called entities and sentiment natural language API

Pre-built APIs: Convert text from one language to another translation API

Pre-built APIs: Convert text into high-quality voice audio text-to-speech API

Pre-built APIs: Recognizes content in static images vision API

Pre-built APIs: Recognizes motion and action in videos video intelligence API

[S., M., C.] Common production challenges for ML models; scalability, monitoring, CI/CD

Google ML solution: Gives retailers the ability to provide google search quality recommendations retail product discovery

[A., C. T., F. S., V., E. A., P.] Vertex AI provides; To train, feature repository, tune hyperparameters, interpreting the data, monitor the ML pipelines autoML, custom training, feature store, vizier, explainable AI, pipelines

[MATLAB notation] Confusion matrix setup [TP, FN; FP, TN]

[Precision/Recall] Proritise catching alot of spam emails -> high recall

[Precision/Recall] Proritise to only catch spam emails -> high precision

GCP Feature stores is a centralised feature repository to server features at scale with low XX latency

A collection semantically related features entity type

In the feature store, each entity must have a unique XX and must be of type YY id, string

Is the process of importing feature values computed by feature engineering jobs into a featurestore feature ingestion

XX is the process of exporting features stored for training or inference. YY XX for high throughput and serving large volumes of data for offline processing. ZZ XX for low-latency data retrieval of small batches of data for real-time process. feature serving, batch, online

[S., R.] The feature store solves the common pain point of being hard to XX and YY features. share, reuse

For source data in the feature store, column name must be of type string

[C., A., B.] For source data in the feature store, the supported file formats/sources are; CSV, Avro, Bigquery

It is important to XX the data before doing any feature engineering cleaning

XX is the process of creating new improved features by combinding different features feature engineering

[N., Ca., B., Cr., E., H.] Feature types; numerical, categorical, bucketized, crossed, embedding, hashed

XX can be used to do automatic feature extraction since manual can be timeconsuming PCA

By using PCA to reduce the dimensions of your feature space, the model will be less likely to XX overfit

[Lo., La.] If possible, map raw data to numerical features e.g instead of using a street name, obtain and use the XX and YY longitude, lattitude

It is essential that a feature is known at XX time prediction

It is important that a numerical feature has meaningful meaning with its magnitude

[Should/Should not] Feature definitions XX change over time should

It is important to consider if there is a time XX for the feature, i.e if the data comes after 3 days, the model is for 3 days back delay

Words feature should be a XX so it can hold relationships to other words word vector

Rule of thumb: Each feature should have XX unique examples 5

It is important to consider if a XX feature should be one-hot encoded or left as is. E.g what to do if ratings are 1-5 and a user gives no rating? numeric

[ML/Statistics] Mindeset: Let's collect more on the outliers, create a separate model for them ML

[ML/Statistics] Mindeset: Let's exclude the outliers in our model statistics

[ML/Statistics] XX is usually best to use when you have a small ammount of data statistics

BQML has two types of feature preprocessing; Occurs during training, Uses the TRANSFORM clause to define the preprocessing automatic, manual

[True/False] BQML handles the data split True

BQML by default assumes that Numbers are XX features and String are YY features numerical, categorical

XX is a synthetic feature formed by multiplying two or more features. This also reduces the number of features reducing the risk of YY. feature crosses, overfitting

Feature type: Depends on space e.g distance spatial

Feature type: Depends on time e.g pickup time temporal

For temporal features, it is important to XX them, e.g using normalization scale

In Dataflow, each DAG inputs and outputs a Pcollection

[True/False] A PCollections does not store all of its data in memory, can be distributed over multiple servers where the data is stored. True

Since Dataflow is distributed, only local save when you have a XX node cluster one

[G., T.] Tensoflow is efficent since it enables XX and YY acceleration GPU, TPU

XX is the most efficent format for data in TensorFlow TFRecords

Finish ML preparation pipeline: Raw Data -> XX -> YY -> ZZ -> Model Traning data extraction, data analysis, data preparation

Meaning of EDA and CDA exploratory data analysis, classic data analysis

Flow for CDA: Problem -> Data -> XX -> YY -> Conclusion model, analysis

Flow for EDA: Problem -> Data -> XX -> YY -> Conclusion analysis, model

Bayesian statistics purpose is the determine XX probabilities based on YY probabilities and new information posterior, prior

EDA type: Simplest for analyzing data. For one variable univariate

EDA type: Used to find out if there are relationship between two variables bivariate

Regression evaluation metric: SUM(|y-y*|) / N = MAE

Regression evaluation metric: SUM((y-y*)^2) / N = MSE

Regression evaluation metric: SQRT(SUM((y-y*)^2) / N) = RMSE

Transform a linear regression model by adding a XX activation function to output between 0 and 1 to use as a logistic regression model sigmoid

By adding a XX term to the loss, overfitting can be combated regularization

By adding XX to the model training, overfitting can be combated early stopping

XX regularization will keep the wieght values smaller L2

XX regularization will keep the models sparser L1

[True/False] You can not use L1 and L2 regularization at the same time False

XX can bee seen as an equivalent replacement for YY regularization, and can therfore be used instead since it is computationally cheaper to compute. early stopping, L2

For logistic regression, use a XX plot of bucketed bias to find sliced where your model performs poorly calibration

[I., Ta., Te., V.] Raw data datatypes supported by AutoML image, tabular, text, video

AutoML default data split in %: training, validation, test 80, 10, 10

XX is the square of the Pearson correlation coefficent between the observed and predicted values. The higher the value indicates a YY quality model. R^2, higher

Classification Metric: Area under the precision recall curve PR AUC

Calssification Metric: Area under reciever operating charecteristic curve ROC AUC

Classification Metric: Cross entropy between the model predictions and the target values log loss

Classification Metric: Harmonic mean of precision and recall F1 score

In Vertex AI, batch predictions is XX meaning that the model will not return a result until it has processed all prediction request. asynchronus

In Vertex AI, online predictions is XX meaning that the model will quickly return a prediction, but only accepts one prediction request per API call. synchronous

AutoML tabular minimum requirements; Number of columns, rows 2, 1000

AutoML tabular maximum requirements; Number of columns, rows [millions], data size 1000, 100, 100 GB

In BQML, XX is very computational expensive and can therefore only be done on a flat-rate plan matrix factorization

RMSE is bad for categorical data, use XX instead cross-entropy

XX is the process of taking a small subset of the data for each step. Reducing the memory usage and is easier to parallelise. mini-batching

[True/False] Batch gradient descent uses a mini-batch and not the full data False

[Directly, Indirectly] Performance metrics should be XX connected to business goals why loss functions can be YY connected to the business goals. directly, indirectly

Other notation for False positive (FP) = XX error type I

Other notation for False negative(FN) = XX error type II

[Overfitting/Underfitting] Be aware of XX as you increase model complexity overfitting

ML technique: Takes different subset of the data for validation iteretevily to go thourgh all data. Then average the performance result. cross validation

Strategy for splitting data: Use when you have lot of data fixed splitting

Strategy for splitting data: Use when you have a small amount of data cross validation

You want a XX dataset when you build a model so you can quickly test the whole development pipeline small

Carefull with what field you split your data, it might become an unusable XX target

Create a repeatable 80% dataset in BQ. WHERE XX( YY( ZZ(date)), AA) < BB MOD, ABS, FARM_FINGERPRINT, 10, 8

Tensorflow is an open-source high-performance library for numberical computations that uses XX directed graphs

Tensorflow uses directed graphs since it makes it more XX and can be easily adapted to another device or language. portable

The lowest level of Tensorflow is called Core Tensorflow (XX), and you [can/can not] add your own code here C++, can

[S., D., C., V.] When you create a tensorflow tensor, you specify a XX, the YY, and if it is ZZ or a AA shape, data, constant, variable

Tensorflow: Records operations for automatic differentiation gradient tape

tf.data.Dataset allows you to; Create XX from in-memory dicts and list of YY and out-of-memory ZZ data files. data pipelines, tensors, sharded

tf.data.Dataset allows you to; Preprocess data in XX and YY results of costly operations parallel, cache

TF dataset consisting of; contains one or more text files, contains TFRecords, one or more binary file TextLineDataset, TFRecordDataset, FixedLengthRecordDataset

[<shortest>, <longest>] In tensorflow to create a dataset from in-memory tensors, use tf.data.Dataset.XX (for one element in dataset) or tf.data.Dataset.YY (for many elements in the dataset) from_tensors, from_tensor_slices

With XX + multithead loading, the CPUs thread keeps preparing the data for next batch while the GPU work prefetching

Feature column API take care of packing the inputs into the input vector of the model e.g by automatic XX categorical input values one-hot encoding

Feature column API function to create a categorical column: If you know the keys beforehand -> tf.feature_column.categorical_column_with_XX vocabulary_list

Feature column API function to create a categorical column: If your data is already indexed -> tf.feature_column.categorical_column_with_XX identity

Feature column API function to create a categorical column: If you do not have a vocabulary of all possible values -> tf.feature_column.categorical_column_with_XX hash_bucket

Tensorflow can directly operate on sparse tensors -> saves XX and YY time memory, computation

[Sparse/Dense] tf.feature_column.embedding_column represents data as a lower-dimensional XX vector dense

tf.keras.layers.Discretization turns continuous numerical features into XX data with descrete ranges. bucket

tf.keras.layers.XX turns String categorical values into an encoded representation that can be read by an embedding layer or Dense layer StringLookup

tf.keras.layers.XX turns Integer categorical values into an encoded representation that can be read by an embedding layer or Dense layer IntegerLookup

[In/Outside of, N., R.] When running on a TPU, you should always place preproccesing XX the tf.data pipeline. Except for the YY and ZZ operations which runs well on a TPU and are common in the first layer of an image model. in, normalization, rescaling

Activation function: f(x) = max(0, x), popular since 10 times faster than sigmoid ReLU

A problem with the normal ReLU activation function is that a layer can die if it only get inputs XX 0 <

Activation function: Combined sigmoid and ReLu to let it be smooth softplus

Activation function: ReLU but lets in some when the input is <0 leaky ReLU

Activation function: ReLU but have a parameter that controlls how much gets in when the input is <0 parametric ReLU

A Keras sequential model consists of XX layers and has YY input and ZZ output stacked, one, one

An example of a non sequential DNN is model with XX connections or a model with YY branches residual, multi

[Overfitting/Underfitting] The deeper the DNN network is, the more prone it is to XX overfitting

The ADAM optimizer is famous for being computational efficent giving it low XX requirements memory

[N., S.] The ADAM optimizer has problems with XX or YY gradients noisy, sparse

To train a keras.model, use history = model. fit

Keras: Datatype of predictions where, predictions = model.predict(input_samples, steps=1) numpy array

[Sparse/Dense, Correlated/Independent] Linear models are good for XX and YY features. DNNs are good for ZZ and AA features. sparse, independent, dense, correlated

In Keras, the XX API is more felxible than the sequential API since it can handle non-linear topologies functional

If a layer is XX, both models training data will help to train the layer i.e require more less data shared

[Lower/Higher] L1 regularization has a XX chance of making weights 0 higher

Tensorflow in Vertex AI can be used to do XX training distributed

For submitting a Vertex AI custom job using CLI, python-XX flag is a CSV file which lists cloud storage URIs specifying Python package files used to setup models. The maxium amount of CS URIs is YY package-uris, 100

For submitting a Vertex AI custom job using CLI for distributed training, specify multiple XX flags in the call compared to one for non distrbuted. worker-pool-spec

For submitting a Vertex AI custom job using CLI, command-line argmuents XX the commands in the config.yaml overrides

[True/False] ML systems can easily build up technical debt True

MLOps level of automation: Build and deploy manually 0

MLOps level of automation: Automate the training phase 1

MLOps level of automation: Automate training, validation and deployment 2

Build you own container using XX and retrieve code directly from Github, Cloud source repository and artifact registry cloud build

Orchestrates multiple containers, handled loadbalance and adapts to declared state kubernetes

[True/False] Kubernetes support both stateful applications such a nginx & apache web server, and stateful applications which stores session data persistently. True

Managed service for Kubernetes within Google Cloud Google kubernetes engine

[U., R.] Google Kubernete Engine can XX and YY nodes automatically. upgrade, repair

GCP Service: Fully fledged VM on GCP compute engine

GCP Service: Enables stateless containers via web request cloud run

A problem with autoscaling, called XX, is that the number of deployed replicas are fluctioating since the metric to spawn them is fluctionating. This can be combated using a YY for the deployment. thrasing, cooldown

Deployment strategy: Like blue green but it is rolled out gradually canary deployment

Deployment strategy: Two versions, but only one towards the users while their interactions are mirrored in the other version shadow testing

[True/False] Kubeflow/TFX can only be used with the Tensorflow ML framework False

Pre-built Kubeflow pipelines or pipeline components can be found and shared at the public XX AI hub

Compared to GKE, using XX, only one click is required to setup a ML pipeline vertex AI pipelines

MLOps framework: Lower-level, direct control of Kubernetes resources control kubeflow

MLOps framework: Higher-level abstractions. Prescriptive but customisable components with pre-defined ML types TFX

TFX brings Googles best practices for robust and XX ML workloads scalable

Using TFX XX, you can answer what data was used to train a specific model and what statistics the model has. lineage tracking

The top 3 features of Vertex AI pipelines is; 1) XX orchestration, 2) Rapid, reliable, repeatable YY 3) ZZ and re-use componentes workflow, experimentation, share

Vertex AI pipelines provides a visual XX interfaces where each block is an ML task that can be done in sequence and in YY. graphical, parallel

Googles cloud ML Python package to capture metrics for different hyperparameter, good for hyperparameter tuning. Import XX hypertune

For making a hyperparameter job, it is important to have the hyperparameters as XX to the model training. input parameters

When pushing a trained model to Vertex AI, it is important to first create a model XX, then a model YY, and lastly you can get predictions from it. object, version

[Dw., Dc.]Kubeflow pipelines containerizes implementations of ML tasks which can invoke services such as XX and YY. dataflow, dataproc

The steps in a Kubeflow pipeline can programatically be specified via the XX SDK Python

In an ML pipeline, it is common to have a metric XX which allows the model to be deployed to production threshold

Kubeflow component type: Just load the component from its description and compose pre-built

Kubeflow component type: The containerization is done for you, but you write the code in Python lightweight

Kubeflow component type: Write the component code and package it into a Docker container custom

Kubeflow Python function to wrap a function func into a prebuilt Docker container to make a lighweight kubeflow component. Kfp.components.XX(func, base_image=BASE_IMAGE) func_to_container_op

[True/False] To make a custom Kubeflow component, you can only use Python to create the model False

TFX is designed to make ML workflows XX between different enivroments portable

TFX deployment target: High performance servers for batch and streaming inference TF serving

TFX deployment target: For inference on IoT and mobile devices TF lite

TFX deployment target: For deployment to low latency web applications TF JS

TFX deployment target: For sharing models and trasfer learning TF hub

A TFX component is an implemented ML XX task

A TFX component produces and consumes XX artifacts

A data aware TFX pipeline can speedup model retraining by checking if XX is necessary between runs, or can be feteched from cache. recomputation

TFX standard component: Entry point for data ingest. Support splitting and partitioning. Inputs CSV, TF Records, Avro, Parquet. Outputs: TF YY and TF sequence YY example gen, examples

TFX standard component: Performs complete pass on data to gather summary statistics, e.g per feature. This includes; Mean, Standard deviation, Quantile ranges, null counts statistics gen

TFX standard component: Automatically generates schemas for TFX. Can also be used to create protobuffers schema gen

TFX standard component: Identifies anomolies in data and visualizes them. For example, it can detect YY by comparing traning and serving data and detect ZZ by looking at series of data for different data splits. example validator, train-serving skew, data drift

TFX standard component: Does preprocessing such as normalization, feature engineering, tf.Transform operations. transform

By bringing in XX to your TF graph, you can reduce train-serving skew from differences in feature engineering, which is one of the largest sources of error in production ML systems. feature engineering

TFX standard component: Trains a TF model. Produces at least one saved TF model which can be shared. trainer

XX is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search for a Keras model keras tuner

TFX standard component: Uses Keras tuner API to tune hyperparameters. Outputs a hyperparameter artifact. Typically, you only run it one time. tuner

TFX standard component: Visualizes model evaluation. Outputs evaluation metrics and a model blessing to show if it is fit for production. evaluator

TFX standard component: Validates the model in the model infrastructure. Prevents bad models from being pushed. Output a model blessing if it is fit for production. infraValidator

TFX standard component: Pushes the model to production. Inputs a model blessing. Can deploy to different target e.g TF lite, TF JS, TF serving pusher

TFX standard component: Does batch inference on TFRecords on an exported model. Can do it remote in the cloud or local in-memory bulkinferer

[First/Latest] If no model has been blessed in the TFX pipeline, the Resolver node will make the XX model blessed first

TFX library: Monitor ML development at scale. See summary statistics, data distributions, compare different datasets, do anomoly detection and automatically generate schemas. tensorflow data validation

TFX library: Preprocess and do feature engineering with TF. Useful for distributed compute and can use Apache Beam. Will automatically adapt to TF/ML YY practices. tensorflow transform, best

TFX library: Visualize data about ML experimentation. Graph of metrics, view of weights, and can easily be shared. tensorboard

TFX libarary: Do model evaluation. Can incorproate Fairness indicators for responsible AI development. Can view common AI fairness metrics. tensorflow model analysis

TFX evaluation libraries: XX during training, YY after training and is [less/more] granualar tensorboard, tensorflow model analysis, more

ML experiments typically starts in a XX instance notebook

TFX ML orchestrators are XX and therefore, the pipelines can run both on-prem and on GCP portable

No mather what TFX orchestrator you choose, TFX will produces the same standard XX for the graph. DAGs

[A. A., K. P., A. B.] TFX supports the following orchestrators; apache airflows, kubeflow pipelines, apache beam

GCP fully managed implementation of Apache airflow cloud composer

On GCP, TFX runs on XX pipeline which in turn runs on YY kubeflow, google kubernetes engine

A TFX custom component can be created by making a: XX function, YY, or by ZZ existing component classes. python, container, extending

To create a TFX custom component from a Python function, the XX decorater and input/output YY are needed. @component, type hints

By using TFX custom componentes from XX, other languages other than Python can be used to define them. containers

[True/False] You can extend TFX components to work with other database systems such as presto, hive, snowflake or Oracle. True

TFX part which contains: trace on what data was used per model. Cache outpus of components so they do not have to be rerun again. Enable retraining from checkpoint. metadata store

XX is the process when the weights are not set to random values, but instead taken from a previous trained model. Is supported in TFX thanks to the metadata store. warm starting

You always want to XX your training applications so you do not have to worry about dependencies, can use the in Kubeflow, and make them portable between runtime environments. containerize

Process of containerizing a PyTorch, Scikit, and XGBoost application: 1) Create model XX script, 2) Create YY, 3) Build th image and push to ZZ training, Dockerfile, container registry

[More/Less] For continous training, if deterioration is fast we shold retrain XX frequent. more

Challenges for continous training is to find the retraining intervall which achieves the XX but keeps the YY down. business requirements, cost

A downside of Apache Airflow is that setup, logging, management can be tedious and XX. This is something that GCPs YY tries to combat. time consuming, cloud composer

Apache Airflow component: Is represented by a node in your DAG. Is an implementation of an operator. task

Apache Airflow component: Performs an action or tell another system to perform an action. Can also be set a sensors to keep running until a criterion is met. operator

A operator in a Apacahe Airflow task can do the common operations of executing XX commands, call arbitrary YY functions, or call other ZZ services. bash, python, GCP

Apache Airflows: Operators that do nothing but shows up on the graph for completeness dummy operators

Apache Airflows: Operator which check that the values of a metrics given as a SQL expression are within a certain tolerance of values in BigQuery. BigQueryXXOperators IntervalCheck

Apache Airflows: Operator which check that the result of a query is within a certain tolerance on an expected pass value. BigQueryXXOperator ValueCheck

In Apache Airflows, if an operation fails, you can XX the whole operation or just send a message to YY fail, pub/sub

Open source framework from Databricks to standardize the data prep/training/deploy loop Mlflow

[Coupled/Decoupled] In GCP, compute and storage is XX decoupled

A regression model that uses L1 regularization techniques is called a XX regression lasso

A regression model that uses L2 regularization tecnhiques is called a YY regression ridge

Main advantage of using TFRecords; Fast XX since it is a sequence of bytes, easy of YY the data which makes it good to ZZ the dataset over multiple workers. loading, shuffling, distribute

XX is the process when the data is read at the same time as the training. parallel interleave

Keras layer which stacks the input to a 1-dimensional vector. flatten

TF metric which tells you how often the predictions are equal to the labels. tf.keras.metrics.XX Accuracy

TF metric which approximate the area under the curve of the ROC Precision/Recall curve. It measure the quality of a binary classifier. tf.keras.metrics.XX AUC

[True/False] Altough Google tries to migrate from AI platform to Vertex AI, some CLI commands still says ai-platform True

Worker mode: Every worker can work by themselves and do not have to synchronize. Is not good if the workers [are/are not] equal in performance. async, are not

Worker mode: Every worker is in sync with eachother. This [is/is not] the recommended architecture. sync allreduce, is

TF distributed training strategy: Synchronize one machine with many accelerators. Creates a replica of the model on each GPU. Data distribution and gradient updates are updated automatically. mirrored

TF distributed training strategy: Synchronize one machine with many TPU cores. Creates a replica of the model on each TPU core. Data distribution and gradient updates are updated automatically. TPU

[Can/Can not] You XX train with current data and predict with stale data can not

If using GCP products for BQ, try to use XX connectors than to build your own pipeline. pre-built

[RMSE/MAE] XX is more sensitive to outliers than YY RMSE, MAE

In AutoML, the confusion matrix is only available for classification models with XX or fewer values for the target column. 10

For a recommendation system, XX feedback is feedback the users can give undirectly such as time on website. implicit

For a recommendation system, XX feedback is feedback the users can give directly such as give ratings on a product explicit

[The same/Differently] A recommendation system is trained XX depending on if it uses explicit or implicit feedback differently

Average rank aka XX is the most common metric for implicit matrix factoring (recommendation system) mean percentile rank

[Production/Test] It is better to fail at the XX stage than to fail at the YY stage. test, production

You should avoid Keras XX() function when working with lookup layers with very large vocabularies. This function set the state (trainable/nontrainable) of a preprocessing layer. adapt

For the Keras Functional API, unlike the Keras Sequential API, we have to provide the XX of the input to the model. shape

[Fastest/Slowest] For submitting a job with gcloud ai custom-jobs, you will get the XX performance using a single-region bucket in the same location, compared to the default of a multi-region fastest

XX is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them, being secure. federated learning

[Words/Sentences] Do sentiment analysis on XX rather than YY. sentences, words

GCP no code solution for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning dataprep

The XX curve is an appropriate metric for imbalanced classification when the output can be set using different thresholds precision-recall

Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always XX active

[Accelerator type] Require high-precision arithmetic -> always XX GPU

MirroredStrategy memory from one instance while MultiWorkerMirroredStrategy own memory per worker -> more XX to hold the dataset memory

TPU nodes are not recommended unless XX by the application. required

Cloud Data Loss Prevention API can be sued to detect XX in a dataset Pii

Hashing is an irreversible transformation that ensures XX and [does/ does not] lead to an expected drop in model performance because you keep the same feature set while enforcing referential integrity anonymization, does not

Large increase in loss is typically caused by anomalous values in the input data that cause XX traps or YY gradients NaN, exploading

[Does/Does not] A learning rate schedule that is not tuned typically shows a loss that starts oscillating after some steps but XX jump back to the top. does not

Regularization reduce XX requirements by pushing the weights for meaningless features to 0. Also, regularization tends to cause the training error to YY RAM, increase

PCA is a valid feature selection method only if the most important variable are the one that have the most XX variation

XX can be used to identify features which are not highly correlated to the target, which can be removed to [increase/reduce] model complexity correlation analysis

Ensuring that categorical features are one-hot encoded and that continuous variables are binned, and create feature crosses for a subset of relevant features will make the model converge [faster/slower] but it increases model YY requirements, and it [is/is not] expected to boost model performance because neural networks inherently learn AA. faster, RAM, is not, feature crosses

[Should/Should not] Vertex AI Vizier XX be used for systems that do not have a known objective function or are too costly to evaluate using the objective function. should

[C., T. T.]Vizier requires sequential trials and does not optimize for XX or YY. cost, tuning time

[True/False] Running tuning locally does not optimize for reproducibility and scalability True

[effective/inefficient] Grid Search is XX for high spaces in time, cost, and computing power. inefficient

Grid Search is a brute-force approach and it is not feasible to fully XX parallelize

Hyperparameter search method: Can limit the search iterations on time and parallelize all trial random search

[Does/Does not] RMSE XX penalize high variance as much as MSE because the root operation reduces the importance of higher values. does not

A XX approach means that the model is split between workers. You can use TensorFlow YY to implement this. model-parallel, mesh

[GCP service] If you need a low latency feature extraction from BQ, export it to XX memorystore

[True/False] Memorystore is a fully managed service True

Vertex AI Model Monitoring is a fully managed solution for monitoring XX that, by definition, requires minimal YY. training-serving skew, maintenance

[True/False] Model retraining fix training-serving skew. False

Post-training XX is the recommended option for reducing model latency when re-training is not possible. quantization

Pruning helps in compressing model size, but it is expected to provide less XX improvements than quantization. latency

Clustering helps in compressing model XX, but it [does/does not] reduce latency. size, does not

XX with YY improves performance on the minority class while speeding up convergence and keeping the predictions calibrated. downsamlping, upweightning

[Increasing/Decreasing] XX the model’s complexity boosts the predictive ability of the model, which is expected to optimize loss convergence when underfitting. increasing

[May/May not] Canary deployments XX affect user experience. may

Multi-armed bandit deployment approach may affect user experience, even if on a small subset of users. This approach could cause XX when moving between services. downtime

Duplicating the preprocessing adds unnecessary dependencies between the training and serving code and could cause XX. training-serving skew

[True/False] For data type input error, it is better to combine all preprocessing steps in a function, and update the default serving signature to accept the input data type wrapped into the preprocessing function call. True

[Increases/Decreases] Self-managed -> XX running cost decreases

[Overfitting/Underfitting] Training loss down but validation loss is going up -> XX overfitting

XX-learning is an unsupervised reinforcement algorithm Q

[Q, D, D P G] The main reinforcement algorithms are X and YY Q-learning, deep deterministic policy gradient

[Supervised/Unsupervised] K-Nearest Neighbors KNN is an XX ML method supervised

[True/False] BQML Anova can manage time-series forecasts and automatically handle anomalies and seasonality. True

[True/False] Linear regression can not cut of seasonality False

Not all frauds are caused by strange movements (outliers) -> use XX models to decipher them e.g XGBoost complex

For unsatisfactory medical models, it is better to deploy a XX model with a classification threshold rather than to try to deduce overfitting in a DNN. logistic regression

If you already have a ML workflow consisting of containers, it is better to use XX rather than cloud composer kubeflow pipelines

If you do not have a fairly uniform distribution, you can use XX scaling which is able to compress data ranges into log(x) log

XX is similar to scaling but uses the standard deviation each value is from the mean to scale. z-score

Is a model trained on outputs of many different models for the same training data metamodel

XX, create many different models for the same data and use the combined output. ensamble

Embeddings are used for XX data categorical

When triggering on uploads of data, it is better to use Cloud Storage which trigger XX than to use Pub/Sub cloud functions

A XX is a deep learning model that can give a different importance to each part of the input data. transformer

XX Cloud TPUs are approximately 70% cheaper compared to normal cloud TPUs preemptible

[Correlated/Uncorrelated] Partial least square creates new variables that are XX uncorrelated

Maximum Likelihood estimator requires XX for variables independence

[Does/Does not] Scale-tiers XX require the application to be containerized does not

XX is a tool to check the performance of TF models helping to obtain an optimized version. TFProfiler

[True/False] k-anonymity anonymizes the data in such as way that it is impossible to identify person-specific information but you maintain all the information contained in the record True

Bagging and boosting is an example of XX ensamble learning

[Dataprep/Data fusion] Build visual data pipelines for integrating data data fusion

[Dataprep/Data fusion] Interact with the content of data to iteratively refine and combine it. dataprep

XX is a lifelike conversational AI with state-of-the-art virtual agents. It has two versions; XX YY (advance) and XX ZZ (standard) dialogflow, CX, ES

XX is Googles Cloud multi/hybrid cloud solution anthos

[TFX/Kubeflow] XX gives you more control over the whole dev to prod life-cycle compared to YY TFX, Kubeflow

Decision trees are explainable as they are and do not need to use Vertex XX explainable AI

[Parametric/Nonparametric] K-nearest neighbours and Decision trees are examples of XX algorithms nonparametric

XX is a service which provides engieer-to-engineer assistance for both GCP and Tensorflow and is free for big enterprises using GCP. tensorflow enterprise

[True/False] Tensorflow I/O can directly read some file formats, such as Parquet into a TF model True

Naive Bayes and K-Nearest Neighbours are examples of XX learning lazy

Tensorflow XX is a Python library for statistical analysis and probability which can be processed on TPU and GPUs probability

[True/False] CNN are supported by BQML since it can stora image data False

[True/False] TFX and Kubeflow is a managed service False

[E., S., P., G] You can save cost by; use notebooks as XX instances, setup an automatic YY routine, use ZZ VMs, get monitoring alerts about AA usage. ephemeral, shutdown, preemptible, GPU

The XX tool is an open source tool that can show you which features affect your model the most. It also lets you interactively try new inferences what-if

Language Intepretability Tool (LIT) is an open source tool developed specifically for the explanation and visualization of XX processing models NLP

XX is an explainability technique for deep neural networks which gives info about what contributes to the model’s prediction. integrated gradient

What-if-tool is for structured data, not XX images

Array and Struct transformations are not available in AutoML but is in XX BQML

XX is for multi-class classification what Sigmoid is for logistic regression softmax

BigQuery I/O connector is the way to connect directly to BigQuery from XX dataflow

[True/False] You can do canary deployment with solely cloud build False

[B, C. S] Avoid storing ML data in block storage like filestore, use XX and YY instead BigQuery, Cloud Storage

In Vertex AI, there are two types of logs; XX logging which logs data connected to the container, YY logging which logs access and latency information container, access

[On/Off] It is important to turn XX eager mode, which lets you execute operation one by one for a Tensorflow model, before deploying to production. off

Vertex AI datasets manages CSV files automatically, but you need to have header with only XX characters, blankspace as YY, and ZZ as delimter. alphanumeric, underscore, comma

[True/False] XRAI is an optimization of the integrated gradient method True

XX can be calculated and used for a affinity system trained with a small amount of data cosine similarity

If you export a Vertex AI dataset, no additional copies of the data is generate, only a XX file with the cloud storage YY are given. JSON, URIs

Cloud composer is for XX, not transformation orchestration

GCP service: XX is good to use for cleaning a dataset dataprep

[True/False] You can import a tensorflow model to BigQuery if the model type is supported by BigQuery. True

To send out message for predictions to user, Build a notification system on XX and use XX Cloud Messaging server to send the notification firebase

Better to store metadata information about BQ tables in the XX data catalog

[Is/Is not] If security is important, employing and deploying directly from AI platform prediction XX an option is not

Search the XX before making your own feature feature store

[Is/Is not] It XX recommended to always train with checkpoints and save them in cloud storage with a folder for each experiment is

If using Vertex AI pre-built containers, ensure that the model artifact excactly has the following filename; TF, Scikit-learn [YY or ZZ], XGBoost. PyTorch save_model.pb, model.joblib or model.pkl, model.bst, model.pth

If possible, recommended to use XX pipelines to orchestrate the ML workflow vertex AI

Do train-serving skew detection by setting up a model XX job monitoring

Do data drift detection by turning on XX which does not require access to the source data drift detection

You can also use XX in Vertex Explainable AI to detect data drift or train-serving skews. feature attributions

For big data, use XX for EDA rather than notebooks BigQuery

By default, Dataflow assigns both public and private IP adresses to workers. However, the XX IP can be disabled to boost security public

[True/False] Simple models might not train faster with GPUs or distributed training since they do not benefit from hardware parallelism True

[Does/Does not] Sci-kit learn XX support distributed training does not

[True/False] For small datasizes, it is better to use a high end machine rather than a distributed set of machines True

Asynchronous distributed training with powerful GPUs requires a lot of network XX bandwidth

If the request volume fluctuates, it can be a good idea to XX the instance scaling fix

Use XX encoding rather than array of floats to encode images base64

[True/False] You can not upload a TFX or Kubeflow SDK pipeline to AI platform pipelines False

Klicka

Skriv

Lyssna

Spel

Skriv ut