Saturday, April 1, 2017

Monkey Business - Back with a vengance

Some of my regular visitors may remember my post about all the troubles I have had with finishing the Monkey Business programming challenge. As the title of this blog post suggests, it came back with a vengeance, this time with the Mode programming challenge. Before I share this story with you, catch a cup and make yourself comfortable, as I foresee that this post is going to be a long one. 

Picture Copyright: My own work, a free for all.

When I started with this challenge, it seemed to turn out to be quite easy, a day maybe two and it should be done. What was asked for seemed to be clear enough: 

"Create an array, fill it with numbers, discover the highest number, and return it. Or return -1 if no   number occurs more frequently than any others."

This lulled me into a false sense of security, and with my guard down, no research about mode(s) done, I simply planned out my program, all the functions it should have, so on so forth. To finish writing the code was taking half a day at most. Everything was working, the code along with some screenshots ready to be uploaded, or so I thought. Then I started thinking ... and instead of uploading the supposedly finished program, I started doing some research on the topic of mode(s).

The very first thing I learned is that a set of numbers can contain more than one mode. Most of the examples I found were looking like this: 1, 2, 2, 2, 3, 3, 3 making the modes of this set 2 and 3. Up to this point I wasn't considering to change any of my existing code, because the challenge didn't ask to return more than one mode, but only the mode, should there be any, or -1 if none is discovered. This last condition has been covered by my original code. This newfound knowledge then led me to believe that if a set of numbers looks like this: 2, 2, 3, 3, 4, 4, 5, 5 it does not contain a mode. Or this set: 3, 3, 3, 4, 4, 4 both numbers are there 3 times, so there is no mode.

Of course this had to be taken into consideration, so I was trying to change my code accordingly. I failed, and failed, and failed some more. I was working on it for days and all my efforts led to my having to start over. I eventually reached the point, about 7 days in, I lost all hope of ever finding a way to get this problem solved. In desperate need for help I sought help in a coding community called http://www.cplusplus.com/ which I suppose many of you do know. Stating my problem according to what I thought was the core of my problem, I asked for a simple line of code, an if-statement, which I couldn't come up with, should suffice.

In the course of an exchange between an experienced member going by the nick of lastchance who helped me out so much, I discovered what the real problem was. At first a mere question was asked, that - by knowing at least something about mode(s) I was able to answer easily: 1, 2, 2, 3, 3, 4, 4, 5 which, do I think, is or are the mode(s)? My answer was, 2, 3, and 4, because they both occur at least two times making it three modes. Suffice to say that this question alone caused me to feel very stupid at first, but in turn helped me to discover the core of the actual problem.

This changed my view of the problem I was in part causing myself, by knowing that there can be both a single as well as more than one mode. Still, the problem to solve was in part still the same:
A set like this 1, 2, 2, 3, 3, 4, according to the websites explaining modes in statistics, has two modes, and a mode is the number occurring most frequently. It should take some more time to learn from lastchance that 2, 2, 2, 3, 3, 3 is containing two modes, 2 and 3, so making it a set of two modes. 

I found out about the fact myself by visiting calculatorsoup,com a great website for doing all sorts of calculations, among others, all explaining that:
"Mode is the value or values in the data set that occur most frequently." 
Yet, when entering 2, 2, 3, 3, 4, 4 not only the above mentioned website but all others as well, would say that the modes are 2, 3, 4. lastchance provided a code example delivering much the same result. This, then, after numerous failed attempts to return -1 if there is in fact more than one mode, I considered adding another array containing the mode(s), and having the function return -1 if there indeed is no mode, in light of this new discovery. Without this the program would not be complete. Also a histogram would have been nice to have. 

With this I started out too plan and write the umpteenth version of the program. This time it would work out, or so I thought. Until completion I found about a major problem that made it yet another failed attempt in solving the challenge. The reason for failing the way it did was not very obvious, and the problem as such surfaced only after being finished writing the code. Up to this point i was working with simple input to fill numList, the array containing all the numbers.

When entering 1, 1, 2, 2, 3, 3 the output as far as modes is concerned was correct, histogram, frequency and numbers also. Then I entered the following sequence: 1, 2, 1, 2, 1, 2, 1, 2 expecting that the output would be: 

mode(s): 1
mode(s): 2

Instead I got both 1 and two displayed numerous times, each being counted only once by the function that determines the frequency of numbers. My first instinct told me that this is merely a matter of changing the if statement dealing with output of the modes, which, in its initial version looked like this:

if (*(mutiMode + index) > 0 && *(multiMode + index) != *(multiMode + index + 1)
{
     cout << "Mode(s): " << *(multiMode + index) << "\n";
}
This didn't help at all. Remember the word histogram! The output from this function was the same as the one in the mode function: 1, 2, 1, 2, 1, 2, 1, 2 freq: 1 ..... 1. In conclusion adding the same condition into this function as well, hoping for the correct result, being 1, 2 - 2, 2. Of course it didn't work out that way. Next thing I tried to change was the function taking care of counting the frequency, again, no luck. It has gotten so far that I even changed my sorting function, which I only have had one initially, so that it would sort by numbers not frequency. The lesson to be learned, which I learned the hard way, is this:

"If parts of your code do not work they are supposed to, don't try to add fixes in different parts of your code, because you are likely to introduce even more errors."

But what was the problem, then? In one word: Sorting. As mentioned, I had one sorting function, sorting both frequency and numList in descending order. In my next revision I had one for numList one for frequency, until in my final version I had one for frequency, and a dual-sort for both the arrays which then led to working code, doing as it should. Although I must admit that the histogram looks a bit strange as far as output of numbers goes. This is something that I considered to be fine as is, as long as the program is doing the job it is supposed to. 

So, to my fellow learners, as well as readers old and new, whom I heartily welcome to my humble abode, I hope you will be able to learn something from this. And my hope for the next challenge, which involves writing a function to calculate the median, is that it turns out to be an easy one! With this, it is time to get back to my IDE, and write some more code.

No comments:

Post a Comment