Categories

Categories are a way to define a list of things that can be referenced by name in some fields. They are defined in the categories section of the schema.

The contents of the file are expected to be in CSV format. The first row is expected to be a header row by default. The first column is expected to be the value to be used. The second column is expected to be the weight to be used. If the second column is not present, the weight is assumed to be 1.0.

Schema

The below schema will make the contents of data/LETTERS.csv available as a category called LETTERS.

categories:
  - name: LETTERS
    file: "data/LETTERS.csv"
    header: true

Arguments

NameTypeDescriptionDefault
namestringThe name of the category. This is the name by which it can be referenced in other parts of the schema.
filestringThe path to the file containing the category data.
headerboolWhether the first row of the file is a header row.true

Example field using a category

categories:
  - name: LETTERS
    file: "data/LETTERS.csv"
fields:
  - name: letter
    type: WeightedCategory
    args:
      from_category: LETTERS

Example category file

# data/LETTERS.csv
LETTER
H
E
L
L
O

Example category file with weights

# data/LETTERS_WEIGHTED.csv
LETTER,WEIGHT
H,4
E,1
L,1
L,1
O,1