Cluebase 2.0 - Part 6: Towards a more useful Category
by Luke Lavin, published November 9th, 2024

In Jeopardy!, sometimes the category says more about the format of the clue rather than the knowledge required to provide the correct response. Playing with subversive and witty clues is part of what makes Jeopardy! so beloved, but for any enthusiast or hopeful-contestant trying to gleam info about the most relevant topics to study, or find practice clues similar to those that typically stump them.

Ideally, Cluebase would be able to solve this by, in addition to keeping category info as-seen-on-TV, providing some sort of additional meta-category that truly portrays the subject matter of each clue.

Topic Modeling vs Domain Classification

Nvidia domain-classifier

explanation goes here implementation example challenges: 184m parameters * 4 bytes per parameter = 736MB Takes too long on CPU + full dataset to run on free prefect cloud, even if memory restrictions don't cause problems. One time inference run on GPU across all clues, then short inference runs for new clues daily. batch size on GPU: saw improvements in speed all the way up through 256 and past, but got intermittent OOMs with 512+, so sticking with 256.