mgg exam tidyverse/50 Questions
with answers
cover the most important part of the Tidyverse for genomics, and that is
ggplot2, which will allow us to - - to make spectacular graphs and data
visualizations using relatively simple commands.
- what is tidy verse - - is a collection of program packages that work with R
to clean up messy data and then provide powerful tools that work well only
on cleaned-up ("tidy") data.
- Cleaning data sets with tidyr - - aka data wrangling
The package you use to tidy up data is called tidyr or "Tidy-R". It provides
one stop commands that clean up data.
- the main commands - - 1. gather
2. separate
3. spread
4. unite
- What is "tidy data"? - - The concept of "tidy data" is built on a set of
standards for data. Following these standards makes it far easier to
manipulate, model and visualize your data.
- Each variable forms a - - column
ie variant
- each observation or biological sample forms a - - row
ie sample
- each type of observational unit forms a - - table
- Messy data, in contrast, comes in five commonly encountered categories
which are - - 1. Column headers are values, not variable names.
2. Multiple variables are stored in one column.
3. Variables are stored in both rows and columns.
4. Multiple types of observational units are stored in the same table.
5. A single observational unit is stored in multiple tables.
- 1. Column headers are values, not variable names example - - For
example, a table has religious preference in the rows and income brackets
for columns. Each cell lists the number of people at that preference and
income.
, problem is the income column names should not be numeric values, they
should be variables
- how to fix the value as variable problem - - make income a variable for a
column then fill in each person individually
gather command
- 2. Multiple variables are stored in one column. - - gender and age ranges
are combined into hybrid variables in the columns. "m04" means "male, age
0 to 4". The values in the middle are number of cases, with NA meaning "not
available"; i.e., no cases.
- the way to fix multiple variables in a column - - The way to tidy up this
data is to SEPARATE the mixed variables into separate variables of gender
and age group.
- 3. Variables are stored in both rows and columns. - - Rows are only
supposed to have observations or samples. Variables only belong in columns.
For example, if you had a column labeled "Temp" that contained the variable
categories, Tmax and Tmin, this puts the variable categories into rows, not
columns.
- how to fix variables in rows and columns - - To tidy this data, you can
SPREAD the MaxMin data back out into the columns into a pair of key-value
columns.
Now there are no variables (Tmax and Tmin) in the rows. Tmax and Tmin
have been moved to the columns.
- 4. Multiple types of observational units are stored in the same table. - - As
an example, the Billboard data set contains information about (1) each song
(artist, song name, other metadata) and (2) the song's rank each week of
each month of each year. These two units of data should be split up into two
tables. Otherwise, you'll be repeating duplicate data throughout the table.
- 5. A single observational unit is stored in multiple tables. - - For example,
you could have a list of most popular baby names over the past five years,
with each year being a separate table. It would be easier to look at the data
if these five tables were combined into one.
- What is an aesthetic? - - aesthetic is a particular mode for making data
values understandable to humans. An aesthetic is what you do with your
data or a subset of your data:
x/y axis, scatterplot, bar, line, color, shape, etc
- The type of data determines your choice of aesthetic in representing that
data. Here are the common types of data: - - Continuous numeric
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Victorious23. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.49. You're not tied to anything after your purchase.