Categories
Categories are a way to define a list of things that can be referenced by name in some fields. They are defined in the categories section of the schema.
The contents of the file are expected to be in CSV format. The first row is expected to be a header row by default. The first column is expected to be the value to be used. The second column is expected to be the weight to be used. If the second column is not present, the weight is assumed to be 1.0.
Schema
The below schema will make the contents of data/LETTERS.csv available as a category called LETTERS.
categories:
- name: LETTERS
file: "data/LETTERS.csv"
header: true
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
| name | string | The name of the category. This is the name by which it can be referenced in other parts of the schema. | |
| file | string | The path to the file containing the category data. | |
| header | bool | Whether the first row of the file is a header row. | true |
Example field using a category
categories:
- name: LETTERS
file: "data/LETTERS.csv"
fields:
- name: letter
type: WeightedCategory
args:
from_category: LETTERS
Example category file
# data/LETTERS.csv
LETTER
H
E
L
L
O
Example category file with weights
# data/LETTERS_WEIGHTED.csv
LETTER,WEIGHT
H,4
E,1
L,1
L,1
O,1