Graded Assignment 6: R Data manipulation and visualization

Authors
Affiliation

Menuka Bhandari

Jelmer Poelstra

Published

November 17, 2025



This graded assignment is worth 10 points and is due on Monday Nov 24th at 11:59 pm.

Monday’s recitation session will not cover this assignment. Instead, Monday’s session will be an opportunity for you to ask questions about, or a recap of other course material.

1 Directions and grading

1.1 Submission expectations

  • Deadline: Monday Nov 24th at 11:59 pm (you are being given extra time because this assignment was provided to you so late)
  • Submission: You will submit your assignment by tagging the instructor in an Issue in your GitHub as per the last step below

1.2 Academic integrity

Use of generative AI Tools (e.g. ChatGPT, Microsoft Copilot, Google Gemini) is permitted
Getting help on the assignment is not permitted
Collaborating, or completing the assignment with others, is not permitted
Copying or reusing previous work is not permitted
Open-book research for the assignment is permitted
APA Citations and/or formatting for this assignment are not required

1.3 Rubric

You can earn a total of 10 points:

  • 9 points for R part
  • 1 points for the Git/GitHub part

2 Detailed steps

2.1 Setting up & Git

  1. Open your R studio server at OSC. Create a new dir for this assignment, /fs/ess/PAS2880/users/$USER/GA6, This should be your working dir for the entire assignment.

  2. Create an empty quarto file and save it inside the GA6 directory as GA6.qmd. This should be your main document for the assignment. Make this document self-contained and render it to HTML. In the YAML header, add the author name, date, change theme to cosmo.

  3. You can initialize a Git repository GA6 either by opening a VS code by following the instructions taught by Jelmer or use the terminal pane available in R next to console. Commit to the repo at least once before your push it to remote repo.

Note: If you have not installed tidyverse yet, use install.package function directly in your console instead of inside the .qmd file. Except installing package run all your function in the code chunk. Use the code chunk option to hide the warnings of your code.

3 Main assignment:

3.1 R Basics

  1. Create a character vector fav_food with five names of your favorite food. Replace third element of the vector with a bird name and extract first and fifth element.

  2. Load the tidyverse package. We have seen in the class that when we load the tidyverse package, we see many warnings. Use the code chunk option to hide the warning. Read your metadata.tsv file from the garrigos-data/meta directory and save it as object metadata. Write down the data structure of the metadata, number of variables, and rows.

  3. In the metadata object, replace the dpi with _dpi of the time variable and cathemerium with cath of the treatment variable.

3.2 Data wrangling

Instead of creating new objects, always pipe your output unless explicitly stated.

  1. List all the datasets available in the ggplot2 package. Read the data from inbuilt- dataset midwest. Save it as an object midwest_datasets and use the midwest_datasets to answer all the question afterwards.

  2. Filter the rows with poptotal > 30000 and popdensity > 800.

  3. Select first 11 variables of the dataset. Create a new column named asian_ameri in midwest_datasets object by dividing popasian by popamerindian and sort asian_ameri in descending order. Save this output as new_dataset object

  4. Compute the total mean population per state and name the mean column as poptotal.

3.3 Quarto and Data Visualization

In our class we created scatter plot, box plot and bar graph. For this assignment, you will create the violin plot by reading the ggplot2 documentation using the midwest_datasets.

  1. Create the violin plot with state in x-axis and area in y-axis. Facet the plot based on the category.

  2. Create a violin plot again with state in x-axis and poptotal in y-axis. Add the jitter points on the plot and color just the points based on the category. Is this point color global or local mapping? please explain.

  3. In the above plot change the theme to theme_bw and color the state manually instead of using default colors. Add the title of the plot using the code chunk options. Save this plot as an object violin_plot.

  4. Export the violin_plot and label it as violoin_plot.png in your current working repository.

3.4 Publish your repo on Github

As before, you’ll publish your Git repo on GitHub and “hand in” your assignment by creating a GitHub Issue.

  1. Create a repository on GitHub, connect it to your local repo, and push your local repo to GitHub.

  2. Create a new issue and tag GitHub users menukabh and jelmerp, asking us to take a look at your assignment.

Back to top