It's time to split! Train-Test-Split in Google Sheets
- sataun
- Nov 27, 2025
- 5 min read
It's been a while and much has changed. Recently, I started learning about machine learning and realized that Google Sheets is often overlooked, even though Google Colab is more commonly used.
So here's an attempt to break down some concepts from Machine Learning into Google Sheets without using Python.
I will try to keep it as Non-Technical as possible in order to keep things simple.
What are we going to do here:
Learn what a train-test-split is
Learn how to implement a train-test-split from scratch
Implement it using only Google Sheet Formulas
What is a train-test-split:
As the name suggests, the concept is simply to split a dataset randomly or based on instructions into two parts- a training portion and a test portion. The purpose is to see how a machine learning model performs while it's being trained on the training portion. This allows you to make adjustments before testing the model on the test portion of the dataset. Based on the models performance in the test portion, further steps are taken.
The reason it is important is so that the model does not learn patterns while being trained and end up not performing well in real life situations. The same principle applies in real life. For example, we learn how to eat and adapt our learning when trying new dishes.
How to implement a train-test-split from scratch:
The first step is to get a dataset. A dummy dataset can be created from ChatGPT/Gemini or you can place your own. I've created this dummy dataset from ChatGPT.
Age | Income | Student | Buys Product |
25 | 50000 | 0 | 0 |
32 | 64000 | 1 | 0 |
47 | 120000 | 0 | 1 |
51 | 110000 | 0 | 1 |
62 | 150000 | 0 | 1 |
23 | 45000 | 1 | 0 |
44 | 99000 | 0 | 1 |
36 | 72000 | 1 | 0 |
29 | 58000 | 1 | 0 |
Place your dataset in a worksheet. I've placed this in the Dataset Worksheet.
Create a small table in order to identify how much of your dataset would be used for training and how much would be for testing. The following table shows a sample:
Test Size in decimals | 0.2 |
Train Size | 8 |
Test Size | 2 |
The formulas you may use to get the size of the training and test data are:
Train Size | =Round((1-Test Size %) * Length of Dataset) |
Test Size | =Round(Test Size % * Length of Dataset) |
Here Round is used to always get a whole number as decimals cannot be considered.
You may place the Test Size in the Test Size in Decimals and place the formulas below it referring it to the specific cell.
Next, add another column to your dataset called Random. This column will be used to generate random numbers which will later be used to rank and sort the rows. Here, use the formula:
=RAND() under the header and drag it till the last cell. It will generate random numbers between 0-1. Your dataset should now be like this:
Age | Income | Student | Buys Product | Random |
25 | 50000 | 0 | 0 | 0.546 |
32 | 64000 | 1 | 0 | 0.101 |
47 | 120000 | 0 | 1 | 0.119 |
51 | 110000 | 0 | 1 | 0.479 |
62 | 150000 | 0 | 1 | 0.827 |
23 | 45000 | 1 | 0 | 0.071 |
44 | 99000 | 0 | 1 | 0.300 |
36 | 72000 | 1 | 0 | 0.239 |
29 | 58000 | 1 | 0 | 0.337 |
40 | 85000 | 0 | 1 | 0.727 |
Next, add another column called Rank beside the Random column and use the formula =RANK.AVG(). You will be required to fill up some parameters in it. First would be value, so select the cell to its left from the Random column. Next would be the range to understand the ranking of the value. Select the entire Random column except the header. Finally, an optional value you may select 1 for ascending or leave it. So your end formula should look like:
=RANK.AVG(E2,$E$2:$E$11) where Column E is your Random column and E1 has the header. Your dataset should look like this now:
Age | Income | Student | Buys Product | Random | Rank |
25 | 50000 | 0 | 0 | 0.546 | 3 |
32 | 64000 | 1 | 0 | 0.101 | 9 |
47 | 120000 | 0 | 1 | 0.119 | 8 |
51 | 110000 | 0 | 1 | 0.479 | 4 |
62 | 150000 | 0 | 1 | 0.827 | 1 |
23 | 45000 | 1 | 0 | 0.071 | 10 |
44 | 99000 | 0 | 1 | 0.300 | 6 |
36 | 72000 | 1 | 0 | 0.239 | 7 |
29 | 58000 | 1 | 0 | 0.337 | 5 |
40 | 85000 | 0 | 1 | 0.727 | 2 |
The final part, select an area to place the train and test dataset. Once decided, select the cell for placement of the Train dataset and place the below formula:
=QUERY($A$1:$F, "SELECT * WHERE F <= " & $I$2, 1) . Breaking it down:
QUERY returns a portion of the dataset according to your requirements.
$A$1:$F, this is the entire range from where the query will select rows which match its parameters.
SELECT *, this is the condition which says get the rows which match your condition and the '*' means get the values from all the columns for that row
WHERE F <=& $I$2, this is the condition where Rank is being considered due to which F is being checked and & $I$2 this is the cell where the train test table from step 3 is being referred to. It simply means, where the value of the train is lesser than or equal to the value placed in the train size.
1, the last parameter indicates how many rows of headers to place in the output. You may keep it at 0 but I find tables with headers betters.
You should have your train dataset showing up in the query. Do note to see that nothing overlaps that area or else it will throw an error. You can even exclude empty rows and sort it by placing the below formula instead:
=QUERY(A1:F,"SELECT * WHERE F<="& I2 &" AND F IS NOT NULL ORDER BY F DESC",1)Either one works
The last query, select the cell where you want your test dataset to start showing up and place the below formula:
=QUERY($A$1:$F$N, "SELECT * WHERE F > " & $I$2, 1)
You might notice the similarity between the query from the previous step, the only difference here is the condition WHERE F > " & $I$2. Here values greater than the train size are being selected. You may even exclude empty rows and sort it by instead placing the below formula:
=QUERY(A1:F,"SELECT * WHERE F>"& I2 &" AND F IS NOT NULL ORDER BY F DESC",1) And that's it, you've just created your first train and test split!
Do share your thoughts! In order to make things easy, I've created a sample with steps HERE . Just make a copy of the sheet and use the formulas in it.
Thank you for reading! Hope you enjoy your day!



Comments