Interesting, repeated values give the model a lot more confidence of the known values. The interpolated #4 value is still off by 12%. It does not extrapolate well at all.
Looking forward to trying it on real world data with more features.
Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification).
Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.
Just playing around with regression mode...
... well, it has a positive slopeLet's see what happens if we copy the exact same values in the dataset 10 times first.
Interesting, repeated values give the model a lot more confidence of the known values. The interpolated #4 value is still off by 12%. It does not extrapolate well at all.Looking forward to trying it on real world data with more features.