Are you labeling me? LabelEncoder in Google Sheets!
- sataun
- 6 hours ago
- 2 min read
If you’ve ever worked with categorical or text variables, you know they can be quite difficult to handle. Challenges like visualization and complex formulas arise! While a Label Encoder doesn’t completely eliminate these challenges, it helps give your dataset more structure..
What we are going to learn:
Purpose of a Label Encoder
How to build a label-encoded dataset in Google Sheets
Purpose of a Label Encoder:
Label Encoder converts categorical data into numerical by assigning a unique integer to each category. This helps in reviewing correlations, reducing your storage and doing complex equations.
How to build a Label Encoded dataset in Google Sheets?
Prepare your dataset
Paste your raw dataset into the Dataset worksheet.
Assume your data range is A2:D (adjust as needed).
Below is a dummy dataset created from ChatGPT.
ID | Color | Size | Brand |
1 | Red | Small | Nike |
2 | Blue | Large | Adidas |
3 | Green | Medium | Puma |
4 | Red | Large | Reebok |
5 | Blue | Small | Nike |
6 | Green | Large | Puma |
Create a table to count unique data points per column
Headers
Create a small table with two row headers:
Column
Count
Next to the Column header, enter:
=ArrayFormula(A2:D2)This returns your column headers in one row.
Under the Count header, enter:
=COUNTUNIQUE(A2:A)Drag this formula across to count unique values in each dataset column.
Column | ID | Color | Size | Brand |
Count | 6 | 3 | 3 | 4 |
Create a table listing unique data points in each column.
Dimensions:
Number of rows = the largest unique count from the previous step.
Number of columns = number of dataset columns + 1 extra column (for numeric replacement values)
In the first row of this table, enter again:.
=ArrayFormula(A2:D2)Under each header, enter:
=UNIQUE(A2:A)Drag this formula across for each dataset column.
Each column now shows the unique categorical values it contains.
b. Add a "Number" column
Name the last column Number.
Fill it with numbers from 1 to the maximum count of unique data points.
These numbers will be used as label-encoded replacements.
ID | Color | Size | Brand | Number |
1 | Red | Small | Nike | 1 |
2 | Blue | Large | Adidas | 2 |
3 | Green | Medium | Puma | 3 |
4 | Reebok | 4 | ||
5 | ||||
6 |
Create the final label-encoded dataset:
In the final output table, enter:
=ArrayFormula(A2:D2)Under the first categorical column, enter:
=VLOOKUP(B3, $M$3:$P$6, 4, FALSE)Where:
B3 = the value to encode
$M$3:$P$6 = your lookup table of unique values + numbers (adjust range)
4 = column index in the lookup table containing the numeric label
FALSE = exact match
Drag the formula down the column.
For the next categorical column
Change the index value (the 3rd argument), e.g.:
=VLOOKUP(C3, $M$3:$P$6, 3, FALSE)Continue adjusting this number depending on how many columns to the right the numeric label is located.
Once you drag the VLOOKUP formulas across all dataset rows and all categorical columns, you will have a fully label-encoded dataset.
ID | Color | Size | Brand |
1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 |
4 | 1 | 2 | 4 |
5 | 2 | 1 | 1 |
6 | 3 | 2 | 3 |
You can make a copy of the sheet with steps from HERE.
Hope you enjoyed reading it! Thank you and have a great day!



Comments