G2 Net Kaggle Competition Idea Flow

2 min readNov 15, 2021

Whenever you participate in a competition, or just anything, you learn something. Shall I describe some things that one notices during this competition called G2Net.

Training with the whole dataset gives better result than training with a subset. Particularly, this data does not work as well training with subset, perhaps due to the range of data it did not cover.
Insufficient memory is a large problem (OOM). Particularly, the approach is to save the resulting array (that had been engineered) stored on disk. One uses Zarr arrays to save this. Even so, one encountered “insufficient memory” due to a memory spikes when loading that exceeds the RAM given by Kaggle. One solve this with two ways. Finally, one reduces the number of engineered features significantly, hence significantly reduces memory requirements.
Also if the problem comes from saving the predictions as pandas DataFrame, then save it instead as numpy arrays and delete them, garbage collect them, before the next loop (epoch/model loop/other loops) would reduce memory as well. But even so, one ever encountered insufficient memory, and one can’t figure out other methods that time other than changing to an instance that have larger RAM (than 12GB for a Kaggle GPU instance (it’s not 16GB, for some reason)).
scheduler.step()is called every step for OneCyclePolicy , instead of every epoch.

Mistakes

Some improvements that one could learn from the competition, to be thoughtful of, includes:

Not being too tenacious: for the first month, one is active with it. After that learning quite a lot, one moves on to another thing. Perhaps next time one could stay in it longer and work on experimenting for stuffs that aren’t available publicly (shared by others on the platform), or some new (stupid or not) ideas that one can think of.
Particularly discovery: how to make new discovery, learning from papers or other people ideas, how to think of one’s ideas, could also be one thing that helps a lot. This way could do something outside of what is being thought of as SOTA several years/months ago (that may still be relevant but could do better given the exponential speedup of development in ML nowadays).
Embracing more difficult challenges as opposes to just slightly more challenging. Of course, challenges are still challenges, but those that are within your scope of capability, while maximize the challenge, takes more time but let you learn the fastest as well. So that’s another lesson for myself.

Leave a message if you have any thoughts or tips to provide to me; or share your thought cycles or ideas if you would like!

Thank you for reading and have fun.

P.S. Currently Looking for work.
Personal blog: wabinab.github.io (articles are unique from articles published in medium).

G2 Net Kaggle Competition Idea Flow

Mistakes

Written by Wabinab