Graded Assignment 6: R Data manipulation and visualization
This graded assignment is worth 10 points and is due on Monday Nov 24th at 11:59 pm.
Monday’s recitation session will not cover this assignment. Instead, Monday’s session will be an opportunity for you to ask questions about, or a recap of other course material.
1 Directions and grading
1.1 Submission expectations
- Deadline: Monday Nov 24th at 11:59 pm (you are being given extra time because this assignment was provided to you so late)
- Submission: You will submit your assignment by tagging the instructor in an Issue in your GitHub as per the last step below
1.2 Academic integrity
| Use of generative AI Tools (e.g. ChatGPT, Microsoft Copilot, Google Gemini) is permitted | |
| Getting help on the assignment is not permitted | |
| Collaborating, or completing the assignment with others, is not permitted | |
| Copying or reusing previous work is not permitted | |
| Open-book research for the assignment is permitted | |
| APA Citations and/or formatting for this assignment are not required |
1.3 Rubric
You can earn a total of 10 points:
- 9 points for R part
- 1 points for the Git/GitHub part
2 Detailed steps
2.1 Setting up & Git
Open your R studio server at OSC. Create a new dir for this assignment,
/fs/ess/PAS2880/users/$USER/GA6, This should be your working dir for the entire assignment.Create an empty quarto file and save it inside the GA6 directory as
GA6.qmd. This should be your main document for the assignment. Make this document self-contained and render it to HTML. In the YAML header, add the author name, date, change theme to cosmo.You can initialize a Git repository
GA6either byopening a VS code byfollowing the instructions taught by Jelmer or use the terminal pane available in R next to console. Commit to the repo at least once before your push it to remote repo.
Note: If you have not installed tidyverse yet, use install.package function directly in your console instead of inside the .qmd file. Except installing package run all your function in the code chunk. Use the code chunk option to hide the warnings of your code.
3 Main assignment:
3.1 R Basics
Create a character vector
fav_foodwith five names of your favorite food. Replace third element of the vector with a bird name and extract first and fifth element.Load the
tidyversepackage. We have seen in the class that when we load thetidyversepackage, we see many warnings. Use the code chunk option to hide the warning. Read yourmetadata.tsvfile from thegarrigos-data/metadirectory and save it as objectmetadata. Write down the data structure of themetadata, number of variables, and rows.In the
metadataobject, replace thedpiwith_dpiof thetimevariable andcathemeriumwithcathof thetreatmentvariable.
3.2 Data wrangling
Instead of creating new objects, always pipe your output unless explicitly stated.
List all the datasets available in the
ggplot2package. Read the data from inbuilt- datasetmidwest. Save it as an objectmidwest_datasetsand use themidwest_datasetsto answer all the question afterwards.Filter the rows with
poptotal > 30000andpopdensity > 800.Select first 11 variables of the dataset. Create a new column named
asian_ameriinmidwest_datasetsobject by dividingpopasianbypopamerindianand sortasian_ameriin descending order. Save this output asnew_datasetobjectCompute the total mean population per state and name the mean column as
poptotal.
3.3 Quarto and Data Visualization
In our class we created scatter plot, box plot and bar graph. For this assignment, you will create the violin plot by reading the ggplot2 documentation using the midwest_datasets.
Create the violin plot with
statein x-axis andareain y-axis. Facet the plot based on thecategory.Create a violin plot again with
statein x-axis andpoptotalin y-axis. Add the jitter points on the plot and color just the points based on thecategory. Is this point color global or local mapping? please explain.In the above plot change the theme to
theme_bwand color thestatemanually instead of using default colors. Add the title of the plot using the code chunk options. Save this plot as an objectviolin_plot.Export the
violin_plotand label it asvioloin_plot.pngin your current working repository.
3.4 Publish your repo on Github
As before, you’ll publish your Git repo on GitHub and “hand in” your assignment by creating a GitHub Issue.
Create a repository on GitHub, connect it to your local repo, and push your local repo to GitHub.
Create a new issue and tag GitHub users
menukabhandjelmerp, asking us to take a look at your assignment.