Certleader Professional-Data-Engineer Questions are updated and all Professional-Data-Engineer answers are verified by experts. Once you have completely prepared with our Professional-Data-Engineer exam prep kits you will be ready for the real Professional-Data-Engineer exam without a problem. We have Rebirth Google Professional-Data-Engineer dumps study guide. PASSED Professional-Data-Engineer First attempt! Here What I Did.

Also have Professional-Data-Engineer free dumps questions for you:

NEW QUESTION 1

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You’ve loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.
What should you do?

  • A. Select random samples from the tables using the RAND() function and compare the samples.
  • B. Select random samples from the tables using the HASH() function and compare the samples.
  • C. Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sortin
  • D. Compare the hashes of each table.
  • E. Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Answer: B

NEW QUESTION 2

You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component in order to train and serve the model your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

  • A. Use SQL in BigQuery to transform the stale column using a one-hot encoding method, and make each city a column with binary values.
  • B. Create a new view with BigQuery that does not include a column which city information.
  • C. Cloud Data Fusion to assign each city to a region that is labeled as 1, 2 3, 4, or 5, and then use that number to represent the city in the model.
  • D. Use TensorFlow to create a categorical variable with a vocabulary lis
  • E. Create the vocabulary file and upload that as part of your model to BigQuery ML.

Answer: C

NEW QUESTION 3

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

  • A. Disable caching by editing the report settings.
  • B. Disable caching in BigQuery by editing table details.
  • C. Refresh your browser tab showing the visualizations.
  • D. Clear your browser history for the past hour then reload the tab showing the virtualizations.

Answer: A

Explanation:
Reference https://support.google.com/datastudio/answer/7020039?hl=en

NEW QUESTION 4

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

  • A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
  • B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
  • C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
  • D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Answer: D

NEW QUESTION 5

Your United States-based company has created an application for assessing and responding to user actions. The primary table’s data volume grows by 250,000 records per second. Many third parties use your application’s APIs to build the functionality into their own frontend applications. Your application’s APIs should comply with the following requirements:
Professional-Data-Engineer dumps exhibit Single global endpoint
Professional-Data-Engineer dumps exhibit ANSI SQL support
Professional-Data-Engineer dumps exhibit Consistent access to the most up-to-date data
What should you do?

  • A. Implement BigQuery with no region selected for storage or processing.
  • B. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
  • C. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
  • D. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.

Answer: B

NEW QUESTION 6

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

  • A. They have not assigned the timestamp, which causes the job to fail
  • B. They have not set the triggers to accommodate the data coming in late, which causes the job to fail
  • C. They have not applied a global windowing function, which causes the job to fail when the pipeline is created
  • D. They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Answer: C

NEW QUESTION 7

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

  • A. Field promotion
  • B. Randomization
  • C. Salting
  • D. Hashing

Answer: A

Explanation:
By default, prefer field promotion. Field promotion avoids hotspotting in almost all cases, and it tends to make it easier to design a row key that facilitates queries.
Reference:
https://cloud.google.com/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotti

NEW QUESTION 8

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

  • A. Redefine the schema by evenly distributing reads and writes across the row space of the table.
  • B. The performance issue should be resolved over time as the site of the BigDate cluster is increased.
  • C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
  • D. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Answer: A

NEW QUESTION 9

You’re training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you’ve discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you’d like to engineer a feature that incorporates this physical dependency.
What should you do?

  • A. Provide latitude and longtitude as input vectors to your neural net.
  • B. Create a numeric column from a feature cross of latitude and longtitude.
  • C. Create a feature cross of latitude and longtitude, bucketize at the minute level and use L1 regularizationduring optimization.
  • D. Create a feature cross of latitude and longtitude, bucketize it at the minute level and use L2 regularization during optimization.

Answer: C

Explanation:
Reference https://cloud.google.com/bigquery/docs/gis-dataa

NEW QUESTION 10

The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster _____.

  • A. application node
  • B. conditional node
  • C. master node
  • D. worker node

Answer: C

Explanation:
The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster master node. The cluster master-host-name is the name of your Cloud Dataproc cluster followed by an -m suffix—for example, if your cluster is named "my-cluster", the master-host-name would be "my-cluster-m".
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces

NEW QUESTION 11

Scaling a Cloud Dataproc cluster typically involves .

  • A. increasing or decreasing the number of worker nodes
  • B. increasing or decreasing the number of master nodes
  • C. moving memory to run more applications on a single node
  • D. deleting applications from unused nodes periodically

Answer: A

Explanation:
After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of worker nodes in the cluster at any time, even when jobs are running on the cluster. Cloud Dataproc clusters are typically scaled to:
1) increase the number of workers to make a job run faster
2) decrease the number of workers to save money
3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage Reference: https://cloud.google.com/dataproc/docs/concepts/scaling-clusters

NEW QUESTION 12

You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using Sidelnputs to join data You noticed that the pipeline is taking longer to complete than expected, what should you do to expedite the Dataflow job?

  • A. Switch to compressed Avro files
  • B. Reduce the batch size
  • C. Retry records that throw an error
  • D. Use CoGroupByKey instead of the Sidelnput

Answer: B

NEW QUESTION 13

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do?

  • A. Deploy a Cloud Dataproc cluste
  • B. Use a standard persistent disk and 50% preemptible worker
  • C. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
  • D. Deploy a Cloud Dataproc cluste
  • E. Use an SSD persistent disk and 50% preemptible worker
  • F. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
  • G. Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instance
  • H. Install the Cloud Storage connector, and store the data in Cloud Storag
  • I. Change references in scripts from hdfs:// to gs://
  • J. Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances.Store data in HDF
  • K. Change references in scripts from hdfs:// to gs://

Answer: A

NEW QUESTION 14

Which of these statements about exporting data from BigQuery is false?

  • A. To export more than 1 GB of data, you need to put a wildcard in the destination filename.
  • B. The only supported export destination is Google Cloud Storage.
  • C. Data can only be exported in JSON or Avro format.
  • D. The only compression option available is GZIP.

Answer: C

Explanation:
Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.
Reference: https://cloud.google.com/bigquery/docs/exporting-data

NEW QUESTION 15

You need ads data to serve Al models and historical data tor analytics longtail and outlier data points need to be identified You want to cleanse the data n near-reel time before running it through Al models What should you do?

  • A. Use BigQuery to ingest prepare and then analyze the data and then run queries to create views
  • B. Use Cloud Storage as a data warehouse shell scripts tor processing, and BigQuery to create views tor desired datasets
  • C. Use Dataflow to identity longtail and outber data points programmatically with BigQuery as a sink
  • D. Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery

Answer: A

NEW QUESTION 16

If you want to create a machine learning model that predicts the price of a particular stock based on its recent price history, what type of estimator should you use?

  • A. Unsupervised learning
  • B. Regressor
  • C. Classifier
  • D. Clustering estimator

Answer: B

Explanation:
Regression is the supervised learning task for modeling and predicting continuous, numeric variables. Examples include predicting real-estate prices, stock price movements, or student test scores.
Classification is the supervised learning task for modeling and predicting categorical variables. Examples include predicting employee churn, email spam, financial fraud, or student letter grades.
Clustering is an unsupervised learning task for finding natural groupings of observations (i.e. clusters) based on the inherent structure within your dataset. Examples include customer segmentation, grouping similar items in e-commerce, and social network analysis.
Reference: https://elitedatascience.com/machine-learning-algorithms

NEW QUESTION 17

You are building a model to make clothing recommendations. You know a user’s fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

  • A. Continuously retrain the model on just the new data.
  • B. Continuously retrain the model on a combination of existing data and the new data.
  • C. Train on the existing data while using the new data as your test set.
  • D. Train on the new data while using the existing data as your test set.

Answer: C

Explanation:
https://cloud.google.com/automl-tables/docs/prepare

NEW QUESTION 18

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.
You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

  • A. Redis
  • B. HBase
  • C. MySQL
  • D. MongoDB
  • E. Cassandra
  • F. HDFS with Hive

Answer: BDF

NEW QUESTION 19

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

  • A. Enable data access logs in each Data Analyst’s projec
  • B. Restrict access to Stackdriver Logging via Cloud IAM roles.
  • C. Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ project
  • D. Restrict access to the Cloud Storage bucket.
  • E. Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit log
  • F. Restrict access to the project with the exported logs.
  • G. Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit log
  • H. Restrict access to the project that contains the exported logs.

Answer: D

NEW QUESTION 20
......

P.S. Thedumpscentre.com now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.thedumpscentre.com/Professional-Data-Engineer-dumps/ (370 New Questions)