Guide to creating an R Markdown Kaggle kernel

Overview

Every new Kaggle kernel must begin as a branch off of an existing Kaggle dataset, for example, see the “New Notebook” button on the right side here:

https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results

If you click that button, you will be creating a notebook or ‘kernel’ that is only allowed to see the data that is included in this dataset (which can include multiple files).

Now let’s see what the inside of a Kaggle kernel looks like. Look at the ‘Code’ view of this kernel and see the raw R Markdown code that was used to generate it:

https://www.kaggle.com/heesoo37/olympic-history-data-a-thorough-analysis/code

This is the rendered report:

https://www.kaggle.com/heesoo37/olympic-history-data-a-thorough-analysis/report

The R Markdown script loads the data directly from Kaggle, and the kernel only has access to the data that is stored in it’s associated Kaggle dataset. You can see that the kernel is explicitly associated with this dataset:

https://www.kaggle.com/heesoo37/olympic-history-data-a-thorough-analysis/data

This means that the code in the kernel must point to the location of the data files on Kaggle, just like how you had to point to the files on your computer when running the code locally. As I will explain later, the data files on Kaggle always live in the relative path ../input/.

Your Kaggle kernel will require code that is very similar to your existing R Markdown documents, but with a few important differences:

You must NOT include the knits::opts_chunk block of code at the top of your R Markdown documents (that is only there for R Studio - so you should not include it in the Kaggle code).
You will need to make changes to the R Markdown YAML header, as detailed in the step-by-step instructions below.
Your script must read the data directly from Kaggle, and your R Markdown script will run on Kaggle’s platform using their version of the data. This means that your script must work from the top with the data exactly as it is formatted on Kaggle. If you made any changes to your files manually after downloading them, you will have to figure out how to incorporate those changes as pre-processing steps using R code in your R Markdown document itself.
If you are having an issue because of Point 3, then an alternative option is to upload the modified version of the dataset that you wish to use, and it will have it’s own page on Kaggle with you as the creator. If you choose to create your own dataset, please follow all of Kaggle instructions for creating a new dataset (https://www.kaggle.com/datasets), and fill out the information as it asks (say exactly where the original data came from and what changes you made to it). You can then upload your data and create a new kernel starting from your new dataset, and it will run with your modified data on Kaggle’s platform.

Step-by-step instructions

Go to your dataset on Kaggle, e.g., https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
Click the “New Kernel” button
Select the “Script” option and a script will open (by default, it will be a Python script).
Click the dropdown menu at the top top of the page and change it from ‘Python’ to ‘R Markdown’. You will see the syntax highlighting change, because now the kernel is reading the script as R Markdown rather than Python (annoyingly, it doesn’t update the starter code).
Delete all the text in the kernel.
Input YAML header info (the block of info surrounded by --- at the very top of your R Markdown script). Replace the title with your project title. You don’t have to put your name the way you do in the homework, because Kaggle will already show your profile with the kernel. Note that you can modify these settings if you wish. Start simple to get your kernel up and running, but then you can look around at other R Markdown kernels you like and see what they put in their YAML header to improve the appearance of their document. You can change the theme, figure size options, etc. This example YAML header is taken from the Olympic History kernel I pointed to earlier:

---
title: 'Olympic history: a thorough analysis'
output:
  html_document:
    number_sections: true
    toc: true
    fig_width: 8
    fig_height: 5
    theme: cosmo
    highlight: tango
    code_folding: hide
---

Copy and paste everything from your R Markdown file into the Kaggle kernel beneath the YAML header, starting BELOW the knits::opts_chunk block of code. You do NOT want to include that.
Go to the part of the script where you load data. You will need to modify these from whatever you were using locally (e.g., file.choose() or the file path on your system), and you will need to replace that with the correct paths on Kaggle’s platform. The files for your dataset will always be located at the path ../input/, and then you add the names of your files, e.g., ../input/data.csv. Check the organization of your data to see what the file structure is.
Add the title for your script in the box at the top left of the kernel. After doing this, the ‘Commit’ button on the top right of the page will change from being grayed out to be blue. You can now commit your work, and it will process you R Markdown document and produce a rendered version on Kaggle.
Hit the ‘Commit’ button! It may take a while to run. If it fails to compile, read the error messages carefully and use Google! When it runs successfully, you will see an ‘Open Version’ button appear, and you can click it to view the rendered kernel.
Any time you want to edit the code, you can hit the ‘Edit’ button in the top right corner of this kernel view, alter the code, and run the script again.

Final words

If you feel stuck, please look at other Kaggle kernels for guidance. You can look at the code in the R Markdown kernels to see exactly how they wrote their kernel in order to get their work to display.

Guide to creating an R Markdown Kaggle kernel

Randi H griffin

June 8, 2019

Overview

Step-by-step instructions

Final words