top of page

Are you labeling me? LabelEncoder in Google Sheets!

If you’ve ever worked with categorical or text variables, you know they can be quite difficult to handle. Challenges like visualization and complex formulas arise! While a Label Encoder doesn’t completely eliminate these challenges, it helps give your dataset more structure..

What we are going to learn:

  • Purpose of a Label Encoder

  • How to build a label-encoded dataset in Google Sheets


Purpose of a Label Encoder:

Label Encoder converts categorical data into numerical by assigning a unique integer to each category. This helps in reviewing correlations, reducing your storage and doing complex equations.


How to build a Label Encoded dataset in Google Sheets?

  1. Prepare your dataset

    1. Paste your raw dataset into the Dataset worksheet.

    2. Assume your data range is A2:D (adjust as needed).

Below is a dummy dataset created from ChatGPT.

ID

Color

Size

Brand

1

Red

Small

Nike

2

Blue

Large

Adidas

3

Green

Medium

Puma

4

Red

Large

Reebok

5

Blue

Small

Nike

6

Green

Large

Puma

  1.  Create a table to count unique data points per column

    1. Headers

      • Create a small table with two row headers:

        • Column

        • Count


          Next to the Column header, enter:

		=ArrayFormula(A2:D2)

This returns your column headers in one row.

Under the Count header, enter:

		=COUNTUNIQUE(A2:A)

Drag this formula across to count unique values in each dataset column.

Column

ID

Color

Size

Brand

Count

6

3

3

4

  1. Create a table listing unique data points in each column.

    1. Dimensions:

      • Number of rows = the largest unique count from the previous step.

      • Number of columns = number of dataset columns + 1 extra column (for numeric replacement values)

        In the first row of this table, enter again:.


		=ArrayFormula(A2:D2)

Under each header, enter:

		=UNIQUE(A2:A)

Drag this formula across for each dataset column.

Each column now shows the unique categorical values it contains.

b. Add a "Number" column

  • Name the last column Number.

  • Fill it with numbers from 1 to the maximum count of unique data points.

  • These numbers will be used as label-encoded replacements.

ID

Color

Size

Brand

Number

1

Red

Small

Nike

1

2

Blue

Large

Adidas

2

3

Green

Medium

Puma

3

4



Reebok

4

5





6





  1. Create the final label-encoded dataset:

    In the final output table, enter:

	=ArrayFormula(A2:D2)

Under the first categorical column, enter:

    =VLOOKUP(B3, $M$3:$P$6, 4, FALSE)

Where:

  • B3 = the value to encode

  • $M$3:$P$6 = your lookup table of unique values + numbers (adjust range)

  • 4 = column index in the lookup table containing the numeric label

  • FALSE = exact match

Drag the formula down the column.


For the next categorical column

Change the index value (the 3rd argument), e.g.:

   =VLOOKUP(C3, $M$3:$P$6, 3, FALSE)

Continue adjusting this number depending on how many columns to the right the numeric label is located.


Once you drag the VLOOKUP formulas across all dataset rows and all categorical columns, you will have a fully label-encoded dataset.


ID

Color

Size

Brand

1

1

1

1

2

2

2

2

3

3

3

3

4

1

2

4

5

2

1

1

6

3

2

3


You can make a copy of the sheet with steps from HERE.


Hope you enjoyed reading it! Thank you and have a great day!

 
 
 

Recent Posts

See All

Comments


DON'T MISS THE FUN.

Thanks for submitting!

FOLLOW ME ELSEWHERE

  • Facebook
  • Instagram

POST ARCHIVE

bottom of page