GCP: Own questions

Övningen är skapad 2021-10-19 av Pontusnord. Antal frågor: 340.




Välj frågor (340)

Vanligtvis används alla ord som finns i en övning när du förhör dig eller spelar spel. Här kan du välja om du enbart vill öva på ett urval av orden. Denna inställning påverkar både förhöret, spelen, och utskrifterna.

Alla Inga

  • Open-source distributed file system that provides high throughput access to application data by partitioning data across many machines. (abbreviation) HDFS
  • Framework for job scheduling and cluster resource management (task coordination) YARN
  • MapReduce; Operation to be performed in parallel on small portion of dataset, output of operation, operation to combine the results Map, key-value pair, reduce
  • Apache ecosystem; Data warehouse with HDFS storages and enable SQL like queries and MapReduce abstractions Hive
  • Apache ecosystem; High level scripting language for ETL workloads pig
  • Apache ecosystem; Framework for writing fast distributed programs for data processing and analysis such as MapReduce jobs with fast in-memory approach. Spark
  • Apache ecosystem; stream processing framework for bounded and unbounded sets. Has a message queue. Flink/kafka
  • Apache ecosystem; Programming model to define and execute data processing pipelines with ETL, batch and stream processing. Beam
  • IAM member types; single person, non-person (application), multiple people google account, service account, google group
  • IAM roles; 1. Owner, Editor viewer 2. Finer-grained control managed by GCP 3. Finer-grained control combination primitive roles, predefined roles, custom roles
  • IAM best practice; Use XX roles when they exist over YY roles. predefined, primitive
  • 1. GCP monitoring, logging and diagnostics solution, Main functions; (D, E, M, A, T, L) stackdriver, debugger, error reporting, monitoring, alerting, tracing, logging
  • Concept; Primary objective is data analysis for large volumes of data, complex queries and uses data warehouses OLAP
  • Concept; Primary objective is data processing, manage databases and modifying data using simple queries OLTP
  • Concept; Stores data by row Row format
  • Concept; Stores data by column column format
  • Concept; Gives you infrastructures pieces such as VMs but you have to maintain it (abbreviation), GC option IaaS, Compute engine
  • Concept; Gives you infrastructure pieced togheter so you just can deploy your code on the platform (abbreviation), GC option PaaS, App engine
  • Compute choice, mainly used for; Websites, mobile apps, gaming backends, RESTful APIs, IoT apps app engine
  • Compute choice, mainly used for; containerised workloads, cloud-native distributed systems and hybrid applications kubernetes engine
  • Compute choice, mainly used for; Currently deployed and on-premise software that you want to run on the cloud, any workload requiring a specific OS or configuration compute engine
  • Preemtible VMs are around XX% cheaper and terminate after YY hours 80, 24
  • Fully managed block storage (SSD/HDD) suitable for VM/containers. persistent disk
  • Affordable object/blob storage suitable for e.g images, videos cloud storage
  • Storage class; High performance with none storage duration standard
  • Storage class; Access once per month or less. 30 day duration nearline
  • Storage class; Access infrequent data. 90 day duration coldline
  • Storage class; Lowest cost for backup and disaster recovery. 365 day duration archive
  • GCP service; Fully managed relational database for SQL (MySQL, PostgreSQL), not scalable and fits small GBs data. Good for YY workloads. cloud SQL, OLTP
  • GCP service; Mission-critical relational database. Combines benefits of relational and non-relational databases. Supports YY. cloud spanner, horizontal scaling
  • GCP service; Columnar database for high throughput and low latency e.g IoT, user analytics, time-series for non-structured key/value data. Has a row index known as a YY. Ideal for handling large amounts of data for a long period of time bigtable, row key
  • True/False; BigTable supports SQL queries? False
  • Bucket names must be globally unique
  • GCP service; Highly scalable NoSQL database for structured data for web and mobile applications cloud datastore
  • Storage choice; your data is unstructured cloud storage
  • Storage choice; your data is structured and you're doing transactional workload using NoSQL cloud datastore
  • Storage choice; your data is structured and you're doing transactional workload using SQL [One database, horizontal scalability] cloud SQL, cloud spanner
  • Storage choice; your data is structured and you're doing analytics workload [ms latency, s latency] cloud bigtable, bigquery
  • Dataproc provides XX using hadoop YY metric autoscaling, YARN
  • BQ can submit jobs by; [W, C, R] web UI, command-line tool, console, REST API
  • GCP service; Clean and transform data with UI cloud dataprep
  • IAM project role; Permissions for read-only actions that do not affect state, such as viewing (but not modifying) existing resources or data. viewer
  • IAM project role; Viewing the data and permissions for actions that modify state, such as changing existing resources. editor
  • IAM project role; Viewing the data, modify and change existing resources. Manage roles and set up billing for project owner
  • Default BigQuery encoding UTF-8
  • GCP service to use: Migrate hadoop job to google cloud; without rewrite, with rewrite cloud dataproc, maybe cloud dataflow
  • BigQuery streaming restrictions; max row size, max througput in records/s per project 1 MB, 100000
  • In BigQuery, use XX[S] and YY[A] to consolidate the data instead of splitting it up into smaller tables structs, arrays
  • ML model; Forecast a number linear regression
  • ML model; Classify with binary or multiclass options logistic regression
  • ML model; Recommend something matrix factorization
  • ML model; Explore data clustering
  • IoT data streaming expects [some/no] delays no
  • [True/False] You can write you own connectors with apache beam? And you can use pipelines using Java, Python and GO? True
  • [True/False] Dataflow can autoscale workers? True
  • [True/False] You can't connect google sheets data together with BigQuery data in data studio? False
  • A data lake is usually in a cloud storage bucket
  • Cleaned data is typically stored in the data warehouse
  • BigQuery can be seen as a serverless data warehouse
  • The ETL is usually done between the data XX and the data YY lake, warehouse
  • [True/False] On federated (external) queries on BigQuery, you get cacheing False
  • GCP service; Scales to GB and TB. Ideal for back-end database. Record based storage cloud SQL
  • GCP service; Scales to PB. Easily to connect external data sources for ingestion. Column based storage BigQuery
  • [True/False] Cloud SQL can be considered as RDBMS True
  • Fast in memory analysis in BigQuery BI engine
  • GCP service; A fully managed and highly scalable data discovery and metadata management service. cloud data catalog
  • GCP service; Fully managed service designed to help you discover, classify, and protect your most sensitive data Cloud data loss prevention
  • GCP service; A fully managed scalable workflow orchestration service. Can automate pipelines cloud composer
  • Cloud composer automated pipelines are written in python
  • Data storage and ETL options on GCP; you data is relational [S, SP] Cloud SQL, Cloud Spanner
  • Data storage and ETL options on GCP; you data is NoSQL [F, B] Cloud firestore, cloud bigtable
  • [True/False] A cloud bucket is associated with a certain region True
  • Storage; XX names can be set to private but YY names can't and should never be sensitive object, bucket
  • Cloud storage; google handles everything, and gives encryption keys to the encryption keys. You can also control what the top encryption key is CMEK
  • Cloud storage; Customer-supplied encryption keys CSEK
  • Bucket access control; allows you to use IAM alone to manage permissions. IAM applies permissions to all the objects contained inside the bucket or groups of objects with common name prefixes., [is/not recommended] uniform, is
  • Bucket access control; enables you to use IAM and ACLs together to manage permissions. [is/not recommended] fine-grained, not
  • Are a legacy access control system for Cloud Storage designed for interoperability with Amazon S3. [is/not recommended] ACLs, not
  • Workload type; Fast, reveal snapshot, simple query, 80% writes and 20% reads transactional
  • Workload type; Read the whole dataset, complex query, 20% writes and 80% read analytical
  • GCP service; NoSQL database built for global apps that lets you easily store, sync, and query data for your mobile and web apps - at global scale. Cloud firestore
  • GCP service; most cost effective to store relational data cloud SQL
  • GCP service; Big database with relational data requiring to be globally distributed cloud spanner
  • GCP service; Require hight throughput and ultra low latency for relational data cloud bigtable
  • Which GCP service is the most cost effective for relational data; BigQuery vs Bigtable BigQuery
  • GCP service; Good as a relational data lake since it can handle the 3rd parry RDBMS; MySQL, PostgreSQL, MS SQL server cloud SQL
  • You want XX data warehouse with scalable scale one
  • In BigQuery, permission are at XX level dataset
  • [True/False] You can authenticate IAM using gsuite and gmail? True
  • Concept; SQL query and look like a read-only table with more fine grained control to only share tables and not the whole dataset. view
  • [True/False] You can't run BigQuery to export data from a view False
  • Precomputed views that periodically cache the results of a query for increased performance and efficiency materialized views
  • [True/False] Cached queries on BigQuery are charged False
  • BigQuery can XX schemas but it's not 100% sure to work autodetect
  • BigQuery service which provides connectors and pre-built load jobs. Good for EL jobs [abreviation] DTS
  • BigQuery service; Can handle late data data backfill
  • [True/False] BigQuery supports user-defined functions in SQL and Javascript True
  • A BigQuery user-defined function (UDF) is stored as an XX and YY[can/can't] be shared object, can
  • Schema design; Stores data efficient and saves space normalized
  • Schema design; Allow duplicate field values for column. Is fast to query and can be easily parallised denormalized
  • Schema design; is not efficient for GROUP BY filtering denormalized
  • For RDBMS, JOINS are [expensive/inexpensive] expensive
  • A struct has the type XX in BigQuery RECORD
  • An array has the type XX in BigQuery REPEATED
  • Having super XX schemas improve performance since BigQuery is YY based wide, column
  • Function; Unpacks a field to a single row value UNNEST
  • Function; Aggregates field to array ARRAY_AGG
  • BigQueries ways of partitioning tables [T C, I T, I R] time column, ingestion time, integer range
  • [True/False] BigQuery does not support JSON parsing False
  • Use the SQL function XX to format values to the same format cast
  • GCP service; Fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines cloud data fusion
  • Which GCP service is considered easiest to use since it has a simple GUI; cloud dataproc, cloud data fusion, cloud dataflow cloud data fusion
  • GCP services, ETL to solve data quality issues; recommended, latency/throughput issues, reusing spark pipelines, need for visual pipeline building [no cloud prefix] BigQuery/Dataflow, bigtable, dataproc, data fusion
  • Labels can be used on [D, T] datasets, tables
  • Cloud dataproc can [automatically/manually] scale you cluster automatically
  • GCP Service; Cloud dataproc can write with petabit bisection bandwidth to [C S, B, C B] cloud storage, bigquery, cloud bigtable
  • For optimising dataproc; make sure that the cluster region and storage region is close
  • For optimizing dataproc; does not use more than XX input files 10000
  • Storage option; Datastore. Unstructured data cloud storage
  • Storage option; Large amount of sparse data. HBase-compliant. Low latency and high scalability cloud bigtable
  • Storage option; Data warehouse. Storage API BigQuery
  • Since autoscaling exist, start with a XX cluster and it will YY if needed small, upscale
  • Data fusion component; used for handlings connections, transforms and data quality wrangler
  • Cloud data fusion is made for XX[streaming/batch] data batch
  • [True/False] Cloud data fusion wrangler can access data from other providers than GCP True
  • Color to indicate in airflow that a DAG has not ben run since a previous DAG failed pink
  • [True/False] A PCollection can both represent streaming and batch data? True
  • Dataflow handles late arriving data using "smart" data watermarking
  • Cloud dataflow; Handles parallel intermediate jobs such as filtering data, formatting, and extracting. And you will need to provide a YY ParDo, DoFn
  • XXByKey is more effective than GroupByKey since dataflow know how to parallelise it combine
  • Additional input to a ParDo transform e.g inject additional data at runtime side input
  • Dataflow SQL integrates with Apache Beam SQL and support XX syntax ZetaSQL
  • Processing streaming data; Pub/sub -> XX -> YY or ZZ [No prefix] dataflow, bigquery, bigtable
  • Pub/Sub is fast since it stores messages in multiple XX for up to YY days by default locations, 7
  • Apache beam window type; Non-overlapping intervals, Dataflow name fixed-time window, tumbling window
  • Apache beam window type; Used for computing i.e gives a time window every interval time, GCP name sliding window, hopping window
  • Apache beam window type; Minimum gap duration between windows e.g a website visit with bursting data session window
  • Cloud dataflow trigger; datetime stamp Event time
  • Cloud dataflow trigger; Triggers on the time a element is processed processing time
  • Cloud dataflow trigger; Condition of data contained in the element data-driven
  • Cloud dataflow trigger; Mix of different triggers composite
  • Acceptable time for data insight with BigQuery, Cloud Bigtable s, ms
  • For cloud Bigtable, you need the data to be pre-XX sorted
  • Optimizing cloud Bigtable; XX related data, YY data evenly, place ZZ values in the same row group, distribute, identical
  • To XX data results in worse performance for Bigtable, it should be YY [data quantity] little, >300 GB
  • Which is the most performance heavy work for BigQuery? I/O[number of columns] or Computing[function uses] I/O
  • BigQuery function to get back previous values LAG
  • BigQuery cached queries are stored for 24 hours
  • [True/False] WITH clauses inhibits BigQuery caching True
  • Self-join i.e join a table with itself is [good/bad] in BigQuery bad
  • Ordering i.e ORDER BY, has to be performed on XX worker a single
  • Approximate functions should be used if an error of around XX is tolerable 1%
  • Approximate function to find the top element APPROX_top_count
  • Ordering should always be the [first/last] thing you do last
  • To optimize BigQuery, but big tables on the [right/left] left
  • Methodology; Looking back at data to gain insight [abbreviation] BI
  • GCP service; Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified AI platform vertex AI
  • GCP service; Train high-quality custom machine learning models with minimal effort and machine learning expertise AutoML
  • ML methodology; Break down text to tokens (word or sentences) and labels them syntactic analysis
  • ML methodology; Group text into negative, positive and neutral together with a score of how much it's expressed. sentiment analysis
  • [True/False] You can query out to a pandas dataframe using BigQuery using %%bigquery df, in notebook True
  • GCP service; An intelligent cloud data service to visually explore, clean, and prepare data for analysis and machine learning. Dataprep
  • GCP services for the "prepare" phase of a ML project [.p .w, .c] (without prefix) dataprep, dataflow, dataproc
  • GCP services for the "preprocess" phase of a ML project [.w, .c, .y] (without prefix) dataflow, dataproc, BigQuery
  • GCP service; Lets you work with human labelers to generate highly accurate labels for a collection of data that you can use in machine learning models. data labeling service
  • GCP service; Enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. kubeflow
  • GCP service; Lets you create and execute machine learning models in BigQuery using standard SQL queries. Can iterate on models BigQuery ML
  • GCP service; Repository for ML components AI hub
  • You can import models from XX to BigQuery ML tensorflow
  • Pre-trained models only yield a good result if the data applied is [common/uncommon] common
  • [True/False] AutoML can train from zip files True
  • For AutoML, labels [can/can't] contain _ and [can/can't] contain special characters can, can't
  • For AutoML, custom models are [permanent/temporary] temporary
  • For an AutoML model, one can predict using an XX command and YY file curl, JSON
  • It's recommended to use [a single/multiple] ML models to solve a complicated problem multiple
  • For Auto ML vision; Images need to be encoded in XX, can maximal be of the file size YY base64, 30 MB
  • Auto ML vision can handle XX to YY labels 0, 20
  • Auto ML vision models works best if there are XX times more items of the most common label than the leas common 100
  • For Auto ML NLP; model will be deleted after XX if not used and after YY if used 60 days, 6 months
  • Auto ML NLP is for [structured/unstructured] text data while auto ML tables is for [structured/unstructured] text data unstructured, structured
  • For auto ML tables; the data can be between AA and BB million rows, CC and DD columns and must be <EE 1000, 100, 2, 1000, 100 GB
  • For auto ML tables prediction; the maximum input size for BQ table or multiple CSV files is XX and for a single CSV file, tie maximum size is YY 100 GB, 10 GB
  • ML; Remove features with XX or more null values 50%
  • ML; Decrease regularization => [increase/decrease] in overfitting decrease
  • Dataflow is for [known/unknown] data sizes unknown
  • Dataproc is for [known/unknown] data sizes known
  • HDFS [does/does not] scale well does not
  • Using persistent disk and HDFS cluster means the data is XX when the cluster is over lost
  • GCP service; recommended for time-series data bigtable
  • Stackdriver is now called google cloud's operations suite
  • [True/False] Firestore supports flexible schemas True
  • VM instances with additional security control Shielded VMs
  • In cloud firestore, having multiple indexes [do, /do not] lead to bigger filesize do
  • GCP service; Is a good replacement for Java ELT pipelines cloud dataflow
  • GCP service; Is a good replacement for MongoDB cloud firestore
  • GCP service; Is a Jupiter + VM service cloud datalab
  • A XX can be used to remove many forms of sensitive data such as government identifiers data loss prevention API
  • Transactional databases has [fixed/non fixed] schema fixed
  • XX databases are structured data stores analytical
  • You can use XX to access HBase in cloud Bigtable HBase API
  • Custom file formats should always be stored in XX cloud storage
  • A document provides for indexing on XX rather than single keys columns
  • XX models are designed to support drilling down and slicing and dicing OLAP
  • GCP service; is the only globally available OLTP database in GCP cloud spanner
  • Cloud Spanner has a row limit of [size] 4 GB
  • Cloud Spanner uses XX as export connector and can export to Apache YY and ZZ format dataflow, AVRO, CSV
  • Cloud Bigtable uses XX for import and export and can export to YY[C], Apache ZZ and AA[S] dataflow, cloud storage, Avro, SequenceFile
  • Cloud firestore indexes; Created by default for each property, indexes multiple values for an entity built-in, composite
  • Cloud firestore uses a SQL like language called GQL
  • For export from firestore to BQ, entities in the export has to have a consistent schema, property values larger than XX is truncated to XX 64 KB
  • Cloud firestore [is/is not] suitable for application requiring low-latency writes (< 10 ms) is not
  • XX SQL dialect is preferred for BigQuery standard
  • BigQuery can use CSV, AA[J], BB[A], CC[O], DD[P] JSON, Avro, ORC, Parquet
  • If a CSV file is not in XX, BigQuery will try to convert it but it might not be correct UTF-8
  • XX is the prefered format for loading data since the datablocks can be read in parallel. Avro
  • You can define an XX with streaming insert to BigQuery to detect duplicat. However, this will minimize YY insertID, throughput
  • In BQ, XX tables rather than joining tables denormalize
  • Fully managed kubernetes engine Cloud run
  • Managed Redis service used for caching Cloud Memorystore
  • XX is the successor to SSL i.e legacy TLS
  • Cloud spanner, maximum CPU utilization to target; for regional, for multiregional 65%, 45%
  • A Cloud Spanner node can handle XX of data [volume] 2 TB
  • Cloud Memorystore; expiration time for memory given in seconds TTL
  • Lifecycle policy will change the storage type on an object, but will not delete it. However, a XX policy can. data retention
  • [True/False] Loading data with similar filenames, e.g timestamps, can cause hotspot for cloud storage True
  • [CPU/Storage] When scaling nodes in cloud spanner, XX utilization is more important than YY utilization. CPU, Storage
  • BigQuery insert only supports [Fileformat] JSON
  • Is a metric on how well service-level objective is being met [Abbreviation] SLI
  • Is a U.S financial reporting regulation governing publicly traded companies [Abbreviation] SOX
  • Is a U.S healthcare regulation with data access and privacy rules [Abbreviation] HIPAA
  • Define responsibilities for delivering a service an consequences when they're not met. SLA
  • Is the number of actual positive cases that were correctly identified during ML training. Recall
  • Is a GCP ML tool to make chatbots Dialogflow
  • Is a fully managed service for securely connecting and managing IoT devices, from a few to millions. Can ingest data to other GCP services. IoT Core
  • N4 instance has [higher/lower] IoPs than an N1 higher
  • [True/False] Cloud dataflow does not supports python False
  • Repeated messages in Cloud Pub/Sub can be a sign of no message acknowledgment
  • A datapipeline captures each change as a source system capture and stores it in a data store. Change data capture
  • Is a serveless managed compute service for running code in response to events that occur in the cloud Cloud functions
  • Is a distributed architecture that is driven by business goals. Microservices are a variation of it [Abbreviation] SOA
  • [True/False] Cloud functions can run python scripts True
  • A XX can distribute work across region if it's not supported by the service (or specified in the question) global load balancer
  • Nested and repeated fields can be used to reduce the amount of XX in BigQuery JOINS
  • Consists of identically configured VMs groups and shall only be used when migrating legacy cluster from on-prem. [Abbreviation] MIGs
  • Is kubernetes' way of representing storage allocated or provisioned by a pod PersistentVolumes
  • The XX are used to designate kubernetes pods with unique identifiers StatefulSets
  • An XX is an object that controls external access to services running in kubernetes cluster ingress
  • GCP serveless services do not require conventional infrastructure provisioning but can be configured using .XX files in app engine yaml
  • Cloud functions are for XX processing. Not continually monitoring metrics event-driven
  • Is the command line interface for cloud Bigtable cbt
  • To start MIGs, the minimum and maximum number of XX togheter with an instance YY is required. instances, instance
  • Is Kubernetes' command line interface kubectl
  • For Kubernetes engine, the CPU utilization for the whole XX (not cluster) is used to scale the deployment deployment
  • For App Engine, file that; configures the runtime e.g python version app.yaml
  • For App Engine, file that; is used to configure task queues queue.yaml
  • For App Engine, file that; is used to override routing rules dispatch.yaml
  • For App Engine, file that; is used to schedule tasks cron.yaml
  • In stackdriver (monitoring), the retention time in days for; Admin activity audit logs, System event audit logs, Access transparency logs, Data access audit loggs 400, 400, 400, 30
  • Stackdriver service; is used to collect information about time to execute functions in a call stack Stackdriver Trace
  • Is google's own encyrption protocol for data in transit QUICC
  • Is a U.S act against collection of information online under the age 13 [Abbreviation] COPPA
  • Is a U.S program to promote a standard approach to assessment, authorisation and monitoring of cloud resource. [Abbreviation] FedRAMP
  • To run BigQuery queries, you need the role; role/BigQuery.XX jobUser
  • Entity & Kind => [GCP service] datastore, firestore
  • [True/False] Jobs with HDFS has to be rewritten to work on cloud Dataflow True
  • Dataflow; 1:1 relationship between input and output in python dataflow map
  • Dataflow; Non 1:1 relationship between input an output in python dataflow flatmap
  • Is BigQueries commandline tool bq
  • Use XX to transfer data from on-prem to cloud [CLI tool] gsutil
  • Use storage XX service when transferring data from another cloud transfer
  • AI platforms notebooks are now in vertex AI XX as well workbench
  • Pub/Sub + dataflow provides [in/out of] order processing in
  • The XX file system is where the actual data is store in cloud Bigtable colossus
  • Keep names [long/short] in Bigtable => reduces metadata short
  • In BigQuery XX is ordering of data in stored format, and is only supported on partitioned tables clustering
  • In BigQuery, XX queries are queries queued to run when the resources are available. Does not count toward concurrence limit batch
  • The only way to achieve strong consistency in cloud Bigtable is to have one replica solely for reads while the other replica is for XX. failover
  • In cloud spanner, XX on key can decrease hotspots hashing
  • [True/False] An UUID generator usually creates the UUID based on sequential data e.g time which can create hotspots in cloud spanner True
  • Universally unique identifiers = UUID
  • In Bigtable; you want [many/a few] tall and [wide/narrow] tables. a few, narrow
  • No specified ingestion => XX partitioning ingestion time
  • [True/False] Clustering keys does not need to be integer or timestamps, they can be data, bool, geography, INT64, numeric, string, timestamp True
  • Parquet [is/is not] supported in drive, but [is/is not] in cloud storage is not, is
  • Cloud functions [is/is not] a good compute service for event processing is
  • GCP service; provides storage with a filesystem accessed from compute engine and kubernetes engine Cloud Filestore
  • Cloud dataflow can transmit summaries every [time] minute
  • [True/False] A subscription can receive information from more than one topic. False
  • Pub/Sub pull requires XX for endpoint to pull via the API authorized credidentials
  • Pub/sub push requires endpoint to be reachable via XX and have YY certificate installed DNS, SSL
  • [True/False] Cloud Dataprep can be used to gain BI insight and see missing and misconfigured data True
  • Are the only formats supported for export in cloud Dataprep [C, J] CSV, JSON
  • In Datastudio, XX connectors are designed to query data from up to 5 sources blended
  • [True/False] Better to conda/pip install in jupyter notebook than cloud shell True
  • ML; Evaluates a model by splitting the data into K-segments k-fold validation
  • Modes with high bias tends to oversimplify models i.e underfit
  • Models with high variance tends to [underfit/overfit] overfit
  • You can use cloud dataproc + spark XX for machine learning MLib
  • Feature engineering can also reduce the XX to train besides improving accuracy time
  • Spark MLib includes XX for frequent pattern mining. BQ ML and AutoML does not! association rules
  • In CloudSQL, an external replica is more for XX purposes and don't add throughput backup
  • [True/False] Datasets in BigQuery are immutable so the location can't be updated True
  • You need to use XX as an intermediary to send BigQuery data to different regions cloud storage
  • What is most important for upscaling/downscaling; CPU or storage utilisation? CPU
  • Cloud IoT core is mainly for XX while Pub/sub + Dataflow and VertexAi handles the rest device management
  • Edge computing consists of edge device, XX device and cloud platform gateway
  • From an IoT device, data can also be sent by IoT core [M, S] MQTT, stackdriver
  • For high-precision arithmetics, use a GPU
  • Distributed training; enables synchronous distributed training on multiple GPUs on one machine. Each variable is mirrored across all GPUs MirroredStrategy
  • Distributed training; enables synchronous strategy which variables are not mirrored and both GPU and CPU are used. CentralStorageStrategy
  • Distributed training; enables synchronous distributed training on multiple GPUs on multiple machines. Each variable is mirrored across all GPUs MultiWorkerMirroredStrategy
  • A group of TPUs working togheter is a TPU pod
  • App engine is used for XX and should not be used for training machine learning models web applications
  • Anomaly detection is classified as [supervised/unsupervised] learning unsupervised
  • ML; is an algorithm to train binary classifiers based on artificial neurones perceptron
  • Principal component analysis is for XX dimension reduction
  • LXX regularisation should be chosen over LYY when you want less relevant features to have weights close to 0. 1, 2
  • ML; If you have an unbalanced dataset, use undersampling
  • ML; Area under curve = AUC
  • ML bias; human bias in favour of decisions made by machines over those made by humans automation bias
  • ML bias; when the dataset does not accurately reflect the state of the world. Reporting bias
  • ML bias; generalizes characteristic of an individual for the whole group. group attribution bias
  • GCP ML API; provides real-time analysis of time-series data and can provide anomaly detection cloud interface API
  • The cloud vision API supports a maximum XX images per batch 2000
  • For dialogflow; XX categories a speaker's intention for a single statement intents
  • For dialogflow; XX are nouns extracted from dialogs entities
  • For dialogflow; XX are used to connect a service to an integration fulfilments
  • For dialogflow; XX are applications that process end-user interactions such as deciding what to recommend. integrations
  • Google recommends a minimum sampling rate of XX Hz for speech-to-text 16000
  • [True/False] The gPRC API is only available with the advance version of translation API True
  • [True/False] For GCP translation API, there's a need to pass parameter into API when there's a special function call for translation e.g when importing the library to Python False
  • GCP serice equivalent; HBase Cloud Bigtable
  • GCP serice equivalent; redis Cloud Memorystore
  • GCP serice equivalent; Apache Beam, Apache Pig Cloud Dataflow
  • GCP serice equivalent; Apache airflow Cloud Composer
  • GCP serice equivalent; MongoDB Cloud Firestore
  • GCP serice equivalent; Apache Flink Cloud Dataflow
  • GCP serice equivalent; Cassandra Cloud Bigtable
  • GCP serice equivalent; Apache Kafka Pub/Sub
  • In stackdriver monitoring, only XX logg has a 30 days retention period compare to 400 days for the other loggs Data access audit

Alla Inga

(
Utdelad övning

https://glosor.eu/ovning/gcp-own-questions.10640811.html

)