100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Data Analytics for Accounting $17.99   Add to cart

Exam (elaborations)

Data Analytics for Accounting

 3 views  0 purchase
  • Course
  • Institution
  • Book

Data Analytics for Accounting

Preview 3 out of 18  pages

  • September 6, 2024
  • 18
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
avatar-seller
TEST BANK and SOLUTION MANUAL for Data Analytics
for Accounting, 3rd Edition by Vernon Richardson

A financial services company needs to aggregate daily stock trade data from the
exchanges into a data store. The company requires that data be streamed directly
into the data store, but also occasionally allows data to be modified using SQL. The
solution should integrate complex, analytic queries running with minimal latency.
The solution must provide a business intelligence dashboard that enables viewing of
the top contributors to anomalies in stock prices. Which solution meets the
company's requirements? - ANSWER: Use Amazon Kinesis Data Firehose to stream
data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon
QuickSight to create a business intelligence dashboard.

Key points to arrive at this answer:
• Data streamed DIRECTLY to data store = Data Firehose does this.
• Integrate complex, analytic queries with min latency = Redshift, OLAP use case and
destination for firehose.
• Business intelligence dashboard = Quicksight.

A financial company hosts a data lake in Amazon S3 and a data warehouse on an
Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards
and wants to secure access from its on-premises Active Directory to Amazon
QuickSight.How should the data be secured? - ANSWER: Use an Active Directory
connector and single sign-on (SSO) in a corporate network environment.

A real estate company has a mission-critical application using Apache HBase in
Amazon EMR. Amazon EMR is configured with a single master node. The company
has over 5 TB of data stored on an Hadoop Distributed File System (HDFS). The
company wants a cost-effective solution to make its HBase data highly available.
Which architectural pattern meets company's requirements? - ANSWER: Store the
data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent
view. Create a primary EMR HBase cluster with multiple master nodes. Create a
secondary EMR HBase read-replica cluster in a separate Availability Zone. Point both
clusters to the same HBase root directory in the same Amazon S3 bucket.

A software company hosts an application on AWS, and new features are released
weekly. As part of the application testing process, a solution must be developed that
analyzes logs from each Amazon EC2 instance to ensure that the application is
working as expected after each deployment. The collection and analysis solution
should be highly available with the ability to display new information with minimal
delays. Which method should the company use to collect and analyze the logs? -
ANSWER: Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to
collect and send data to Kinesis Data Firehose to further push the data to Amazon
Elasticsearch Service and Kibana.

,KDF data sources: Kinesis SDK, Cloud watch logs & events, Kinesis Agent, KPL, Kinesis
Streams. KDF outputs to S3, Redshift, ElasticSearch, and Kinesis Data Analytics

Kinesis Data Stream is always a polling service, consumers poll from KDS. Consumers
include KCL, Lambda, kinesis streams, kinesis analytics.

A data analyst is using AWS Glue to organize, cleanse, validate, and format a 200 GB
dataset. The data analyst triggered the job to run with the Standard worker type.
After 3 hours, the AWS Glue job status is still RUNNING. Logs from the job run show
no error codes. The data analyst wants to improve the job execution time without
overprovisioning. Which actions should the data analyst take? - ANSWER: Enable job
metrics in AWS Glue to estimate the number of data processing units (DPUs). Based
on the profiled metrics, increase the value of the maximum capacity job parameter.

A company has a business unit uploading .csv files to an Amazon S3 bucket. The
company's data platform team has set up an AWS Glue crawler to do discovery, and
create tables and schemas. An AWS Glue job writes processed data from the created
tables to an Amazon Redshift database. The AWS Glue job handles column mapping
and creating the Amazon Redshift table appropriately. When the AWS Glue job is
rerun for any reason in a day, duplicate records are introduced into the Amazon
Redshift table. Which solution will update the Redshift table without duplicates
when jobs are rerun? - ANSWER: Modify the AWS Glue job to copy the rows into a
staging table. Add SQL commands to replace the existing rows in the main table as
postactions in the DynamicFrameWriter class.

A streaming application is reading data from Amazon Kinesis Data Streams and
immediately writing the data to an Amazon S3 bucket every 10 seconds. The
application is reading data from hundreds of shards. The batch interval cannot be
changed due to a separate requirement. The data is being accessed by Amazon
Athena. Users are seeing degradation in query performance as time progresses.
Which action can help improve query performance? - ANSWER: Merge the files in
Amazon S3 to form larger files.

A company uses Amazon Elasticsearch Service (Amazon ES) to store and analyze its
website clickstream data. The company ingests 1 TB of data daily using Amazon
Kinesis Data Firehose and stores one day's worth of data in an Amazon ES cluster.
The company has very slow query performance on the Amazon ES index and
occasionally sees errors from Kinesis Data Firehose when attempting to write to the
index. The Amazon ES cluster has 10 nodes running a single index and 3 dedicated
master nodes. Each data node has 1.5 TB of Amazon EBS storage attached and the
cluster is configured with 1,000 shards. Occasionally, JVMMemoryPressure errors are
found in the cluster logs. Which solution will improve the performance of Amazon
ES? - ANSWER: Decrease the number of Amazon ES shards for the index.

A manufacturing company has been collecting IoT sensor data from devices on its
factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A

, data analyst has determined that, at an expected ingestion rate of about 2 TB per
day, the cluster will be undersized in less than 4 months. A long-term solution is
needed. The data analyst has indicated that most queries only reference the most
recent 13 months of data, yet there are also quarterly reports that need to query all
the data generated from the past 7 years. The chief technology officer (CTO) is
concerned about the costs, administrative effort, and performance of a long-term
solution. Which solution should the data analyst use to meet these requirements? -
ANSWER: Create a daily job in AWS Glue to UNLOAD records older than 13 months
to Amazon S3 and delete those records from Amazon Redshift. Create an external
table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum
to join to data that is older than 13 months.

An insurance company has raw data in JSON format that is sent without a predefined
schedule through an Amazon Kinesis Data Firehose delivery stream to an Amazon S3
bucket. An AWS Glue crawler is scheduled to run every 8 hours to update the
schema in the data catalog of the tables stored in the S3 bucket. Data analysts
analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue
Data Catalog as the metastore. Data analysts say that, occasionally, the data they
receive is stale. A data engineer needs to provide access to the most up-to-date
data. Which solution meets these requirements? - ANSWER: Run the AWS Glue
crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event
notification on the S3 bucket.

A company that produces network devices has millions of users. Data is collected
from the devices on an hourly basis and stored in an Amazon S3 data lake. The
company runs analyses on the last 24 hours of data flow logs for abnormality
detection and to troubleshoot and resolve user issues. The company also analyzes
historical logs dating back 2 years to discover patterns and look for improvement
opportunities. The data flow logs contain many metrics, such as date, timestamp,
source IP, and target IP. There are about 10 billion events every day. How should this
data be stored for optimal performance? - ANSWER: In Apache ORC partitioned by
date and sorted by source IP

A banking company is currently using an Amazon Redshift cluster with dense storage
(DS) nodes to store sensitive data. An audit found that the cluster is unencrypted.
Compliance requirements state that a database with sensitive data must be
encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.) -
ANSWER: Set up a trusted connection with HSM using a client and server certificate
with automatic key rotation and Create a new HSM-encrypted Amazon Redshift
cluster and migrate the data to the new cluster.

A company is planning to do a proof of concept for a machine earning (ML) project
using Amazon SageMaker with a subset of existing on-premises data hosted in the
company's 3 TB data warehouse. For part of the project, AWS Direct Connect is
established and tested. To prepare the data for ML, data analysts are performing
data curation. The data analysts want to perform multiple step, including mapping,

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller kushboopatel6867. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $17.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75323 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$17.99
  • (0)
  Add to cart