Metadata: making JSON files

In this use case we read the JSON fields from a table and create JSON files with generic and file-specific information

Code
R
Intermediate
Authors
Affiliation

F. Garassino

Center for Reproducible Science, UZH

G.Fraga Gonzalez

Center for Reproducible Science, UZH

Published

May 5, 2025

1. Creating JSON files from a metadata table

In this tutorial, we will show you how to create simple human and machine-readable metadata files in JavaScript Object Notation (JSON). JSON files consist of fields of key-value pairs. These are sidecar metadata files, that is, they accompany a separate source data file and provide essential information about the data.

JSON metadata files differ from structured metadata files (i.e., tables) because of their machine-readability. While structured metadata files may contain human-only readable columns (e.g., “comment” columns with free-text notes), JSON files should not. However, they can have more details than the metadata tables.

Making JSON files

Good editors with a graphical interface are available online to read and write JSON files. We recommend the following: https://jsoneditoronline.org/

However, we recommend creating JSON files with a script and not manually to save time and prevent data entry errors. We will demonstrate how in this tutorial.

1.1 Setting the stage for this tutorial

Here, we will work with an example situation derived from an imaging experiment. We will automate the creation of a JSON file from a metadata table containing image file names and locations, as well as information about the images (e.g., subject ID, subject sex, condition in which the subject was observed, treatment the subject received).

Requirements for creating the JSON files

  • Metadata table, containing the name of the reference images and metadata. If relying entirely on the metadata on these tables (provided they have sufficient information) we do not require access to the actual files.

  • Potentially, additional table(s) with JSON fields. The JSON files will therefore have additional information not found in the metadata table.

  • Description of filenaming convention, codebook (i.e., ?), and glossary of abbreviations used in the metadata table

  • The jsonlite R package, used to write the JSON string

  • Naturally, some R code

1.2 Let’s get to work!

Let’s assume a relatively common structure for the dummy dataset we’ll be using for this tutorial:

#| eval: false
experiment_results   # The base folder of our dataset
│
├── ...   # Folder(s) with other kinds of data
│
└── imaging   # The folder containing the imaging data
      │
      ├── ...
      │
      └── subject_n   # Each measured subject has a folder
            │ 
            └── subject_n_imgfile.tiff  # The image file

 

The code we provide will parse a dummy metadata table to create one JSON file for each row of the table, which describes one data file (in this case, an image). Our script thus will generate companion files for all image files, as necessary for machine readability.

Machine readability

As we are automating a task, it’s essential that our metadata table is formatted to be machine readable. This means that when preparing the table one should have paid attention to (among others) avoiding blank rows, if possible avoiding empty cells, using only the first row for header information (i.e. variable names).

Futhermore, the metadata table should be part of a spreadsheet also containing a codebook explaining what each variable is. The number of variables (= the number of columns) and their names in the metadata tables should be the same in the codebook.

For further information on readability of spreadsheets, see the Six tips for better spreadsheets by J.M. Perkel .

Code
# these packages are required for correct functioning of the tutorial

library("dplyr") # data operations
library("kableExtra") # for rendering of tables in HTML or PDF

First of all, let’s create a simple dummy (or toy) metadata table:

Code
n_rows = 30 # defining how many rows (in this case, how many study "subjects") we want in the table

metadata <- tibble(id = paste("subject", sprintf("%02d", 1:n_rows), sep = "_"), 
                   # this simply creates "subject_n" entries with n from 1 to n_rows
                   img_location = paste("experiment_results/imaging/subject_",sprintf("%02d", 1:n_rows), "/subject_", sprintf("%02d", 1:n_rows), "_imgfile.tiff", sep = ""),
                   sex = replicate(n_rows, sample(c("male", "female"), size=1), simplify = T),
                   # this and following lines will randomly fill a column with attributes chosen between a set of options (in this case, "male" or "female")
                   condition = replicate(n_rows, sample(c("A", "B"), size=1), simplify = T),
                   treatment = replicate(n_rows, sample(c("control", "treat_1", "treat_2", "treat_3"), size=1), simplify = T)
)

Let’s take a look:

id img_location sex condition treatment
subject_01 experiment_results/imaging/subject_01/subject_01_imgfile.tiff female A treat_1
subject_02 experiment_results/imaging/subject_02/subject_02_imgfile.tiff female A control
subject_03 experiment_results/imaging/subject_03/subject_03_imgfile.tiff male B treat_3
subject_04 experiment_results/imaging/subject_04/subject_04_imgfile.tiff female B treat_1
subject_05 experiment_results/imaging/subject_05/subject_05_imgfile.tiff female A treat_3
subject_06 experiment_results/imaging/subject_06/subject_06_imgfile.tiff male B treat_1
subject_07 experiment_results/imaging/subject_07/subject_07_imgfile.tiff female B treat_1
subject_08 experiment_results/imaging/subject_08/subject_08_imgfile.tiff male B treat_3
subject_09 experiment_results/imaging/subject_09/subject_09_imgfile.tiff female A control
subject_10 experiment_results/imaging/subject_10/subject_10_imgfile.tiff male A treat_2
subject_11 experiment_results/imaging/subject_11/subject_11_imgfile.tiff female B treat_1
subject_12 experiment_results/imaging/subject_12/subject_12_imgfile.tiff female B control
subject_13 experiment_results/imaging/subject_13/subject_13_imgfile.tiff female B treat_3
subject_14 experiment_results/imaging/subject_14/subject_14_imgfile.tiff female B treat_3
subject_15 experiment_results/imaging/subject_15/subject_15_imgfile.tiff female B control
subject_16 experiment_results/imaging/subject_16/subject_16_imgfile.tiff female B treat_1
subject_17 experiment_results/imaging/subject_17/subject_17_imgfile.tiff female A control
subject_18 experiment_results/imaging/subject_18/subject_18_imgfile.tiff female A treat_2
subject_19 experiment_results/imaging/subject_19/subject_19_imgfile.tiff female A treat_1
subject_20 experiment_results/imaging/subject_20/subject_20_imgfile.tiff female B treat_1
subject_21 experiment_results/imaging/subject_21/subject_21_imgfile.tiff male A treat_1
subject_22 experiment_results/imaging/subject_22/subject_22_imgfile.tiff female B treat_2
subject_23 experiment_results/imaging/subject_23/subject_23_imgfile.tiff male B control
subject_24 experiment_results/imaging/subject_24/subject_24_imgfile.tiff female A treat_3
subject_25 experiment_results/imaging/subject_25/subject_25_imgfile.tiff male B treat_1
subject_26 experiment_results/imaging/subject_26/subject_26_imgfile.tiff female A treat_1
subject_27 experiment_results/imaging/subject_27/subject_27_imgfile.tiff female A treat_1
subject_28 experiment_results/imaging/subject_28/subject_28_imgfile.tiff male B control
subject_29 experiment_results/imaging/subject_29/subject_29_imgfile.tiff male A treat_3
subject_30 experiment_results/imaging/subject_30/subject_30_imgfile.tiff male A control

Now, let’s make the corresponding JSON files. For readability purposes, here we use a for loop to iterate over the lines of the metadata table. This solution can be very slow when dealing with large metadata tables, so below we will illustrate an alternative, faster solution.

Code
library(jsonlite)
library(stringr)

saveoutput <- F # set to TRUE or T to automate JSON saving

for (i in 1:nrow(metadata)) {
  # Create JSON for current row
  row_metadata <- metadata %>% 
    slice(i)
  json_metadata <- toJSON(row_metadata, pretty = TRUE, auto_unbox = TRUE )
  
  if (saveoutput) {
    # create the output path for the JSON
    json_path  <- row_metadata %>% 
      pull(img_location) %>%  # this will give us the full path to the image
      str_replace(., ".tiff", ".json") # and this will remove the filename
    
    # write JSON file to appropriate location if triggered  
    write(json_metadata, file = json_path)
    print(paste0("Wrote ", json_path))
  }
  
}
Tip

The code snippet you just saw includes the possibility to save the generated JSON files into the folders containing image files mentioned in the img_location column of the metadata table. If you want to use this functionality during your execution, simply change the saveoutput variable to TRUE or T.

The toJSON function of jsonlite will convert anything in a table (in our case, a single row of the metadata table) into an R character vector of length 1, i.e. containing a single string. This one string is formatted according to the JSON format specifications. Here’s how one of our JSON looks like:

Code
print(json_metadata)
[
  {
    "id": "subject_30",
    "img_location": "experiment_results/imaging/subject_30/subject_30_imgfile.tiff",
    "sex": "male",
    "condition": "A",
    "treatment": "control"
  }
] 
Note

Notice that the file starts with a [ and ends with a ]. The content of a row of the metadata table is delimited by {}. This delimited field contains column_name:value pairs in separate lines (separated by a newline, \n). Therefore, we could say that toJSON “expands” the row of the metadata table into a list describing each of its cells.

And just like that, you’ve created your first JSON files. Congratulations!

1.3 Code

This is all the code that you’ll need to execute what we talked about in this tutorial’s section, grouped in one place.

Code
n_rows = 30 # defining how many rows (in this case, how many study "subjects") we want in the table

metadata <- tibble(id = paste("subject", sprintf("%02d", 1:n_rows), sep = "_"), 
                   # this simply creates "subject_n" entries with n from 1 to n_rows
                   img_location = paste("experiment_results/imaging/subject_",sprintf("%02d", 1:n_rows), "/subject_", sprintf("%02d", 1:n_rows), "_imgfile.tiff", sep = ""),
                   sex = replicate(n_rows, sample(c("male", "female"), size=1), simplify = T),
                   # this and following lines will randomly fill a column with attributes chosen between a set of options (in this case, "male" or "female")
                   condition = replicate(n_rows, sample(c("A", "B"), size=1), simplify = T),
                   treatment = replicate(n_rows, sample(c("control", "treat_1", "treat_2", "treat_3"), size=1), simplify = T)
)

library(jsonlite)
library(stringr)

saveoutput <- F # set to TRUE or T to automate JSON saving

for (i in 1:nrow(metadata)) {
  # Create JSON for current row
  row_metadata <- metadata %>% 
    slice(i)
  json_metadata <- toJSON(row_metadata, pretty = TRUE, auto_unbox = TRUE )
  
  if (saveoutput) {
    # create the output path for the JSON
    json_path  <- row_metadata %>% 
      pull(img_location) %>%  # this will give us the full path to the image
      str_replace(., ".tiff", ".json") # and this will remove the filename
    
    # write JSON file to appropriate location if triggered  
    write(json_metadata, file = json_path)
    print(paste0("Wrote ", json_path))
  }
  
}

2. Creating JSON files with a metadata table and information from additional files

The example above was about the simplest possible situation you might encounter when creating JSON files. A more realistic situation you could encounter is that in which you have created a metadata table, but want to create JSON files combining its information with that present in other files.

2.1 Setting the stage for this tutorial

In this part of the tutorial, we will create a script that allows you to customise the JSON-making process by editing a table that is used to specify which fields will be included in the JSON files. The information for these fields will be extracted from both the metadata table and the file names of the data files: