Over the last few days, I have been trying to write a "canonical Q&A" on the class imbalance "problem" for the datascience stackexchange because there is so much confusion over there. I think @Dave will know what I mean.
As many here will know, on CV we have some great threads on this topic, including these:
What is the root cause of the class imbalance problem?
When is unbalanced data really a problem in Machine Learning?
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
Brier Score and extreme class imbalance
There are so many great answers and comments, and happily a reasonable consensus has formed. I want to give attribution to the correct people and that is what I tried to do.
I'm sure that I have missed things, and if anyone would like me to add their contributions, please let me know!
Here is the link to the Data Science.SE thread: Is class imbalance really a problem in machine learning?.