Friday, April 27, 2012

Project:Sorter Discussion

Hi all,

Here's a place for any issues you may wish to discuss related to Project:Sorter.

40 comments:

  1. How many different objects are you expecting to test for sorting both merge and quicksorts in the first milestone?

    ReplyDelete
    Replies
    1. Your solution should allow for sorting of any number of elements (assuming there is memory to allocate those elements).

      For example, in my solution I tested on lists of size 1 million elements.

      Joe

      Delete
  2. For the QuickSort class, would it be better to utilize a LinkedList instead of an ArrayList? Given the nature of how the quick sort algorithm is supposed to work, it seems to me sequential access of the elements would be more efficient. However, on my computer, quick sorting with an ArrayList is faster than a LinkedList.

    ReplyDelete
    Replies
    1. QuickSort will work much faster with an ArrayList. This is because if you wish to access an element inside of a LinkedList, you must iterate through until you reach that element. For more details, check out section 4 of the wikipedia article on Linked Lists: http://en.wikipedia.org/wiki/Linked_list#Tradeoffs

      Delete
    2. Actually, if QuickSort is implemented properly, it shouldn't be much slower on (doubly) linked lists than on arrays.

      Joe is correct that you have to iterate through a LinkedList to find a particular element, which is very slow. But this is not something you should have to do during QuickSort.

      Delete
  3. When we start threading our code, will there be a significant performance improvement? I'm working with using different numbers of threads right now, and there doesn't seem to be a large improvement. Sometimes it goes slightly slower, even. Right now I'm running tests on multi-threaded quicksorting. The version I'm running isn't in my repository yet, but I'll put it in there if you want to see it.

    ReplyDelete
    Replies
    1. Hi Max,

      If you've done it right, and you are running on a multicore machine, you should see a noticeable improvement, for big lists. For very small lists, making multiple threads will probably slow you down slightly.

      I've glanced over your code and have some suggestions.

      First thing: Don't just modify your MergeSorter and QuickSorter from the first checkpoint, to have multiple threads. Make new classes. That will also make it more convenient when you want to compare the performance.

      Second thing: You have a race condition on your currentThreads variable.

      Third thing: You seem to be doing excessive recopying of your lists. This may be hurting your performance.

      Since I don't know exactly what tests you've done, it's hard for me to comment on the results. It would be helpful if you put some code in SorterTest to actually do a timing comparison of the same list sorted with different numbers of threads.

      Delete
  4. Are we expected to have specific results with efficiency? Through my tests, MergeSort has been giving me faster times than QuickSort (100 tests of list sizes of 1 million all averaged)with MergeSort being about .2 seconds faster.

    I just thought I heard quick sort was supposed to be a faster way of sorting. Is this true? Or are these results fine?

    ReplyDelete
    Replies
    1. As far as I can tell, there are no efficiency requirements specified in the spec. However, your code should take no more than a few seconds to run on 1 million elements. This means if you are using the code provided with the MergeSort lab you will need to modify the sortMergedLists method which is EXTREMELY inefficient.

      Delete
    2. @Joseph Collard: Really? I guess looking at it now, I can see that, but right now, running off of the CS servers, it's mergesorting in about 20 ms. Is that supposed to be slow?

      Delete
    3. @Max: How many elements are in the list you are sorting? Using the mergeSortingLists method will cause your algorithm to take O(n^2) time. On the CS machines with 1 million elements, it takes well over a minute.

      Delete
    4. @Joe: I'm sorting 1,000,000 elements. It's taking like 60 ms for Integers at the moment, and other types of elements like Strings are taking about 100-200 ms. Looking at my code now, I did change
      while(!leftList.isEmpty()) list.add(leftList.remove(0));
      to
      if(!leftList.isEmpty()) list.addAll(leftList);
      and same with the rightList. I'm not sure if that would've increased the speed dramatically, but I realize now that I actually did change the mergeSortedLists method.

      Delete
    5. This will make a dramatic difference. When you use remove(0) on an ArrayList, the entire array list must be copied each time you make this call. Because you are making this call for every element in the list, you are copying your list roughly n times where n is the number of elements in the list.

      Delete
    6. I've run into a similar issue. I've confirmed that my threaded implementation of merge sort and quick sort work properly. When sorting 100,000 integers, both algorithms take, on average, 150ms to 75ms to complete. In this case quick sort usually wins by a margin of about 50ms. When I bump N up to one million, merge sort takes less than 500ms and quick sort takes around 2 seconds.

      I just now noticed that if I increase the range of my input data to integers between 0 and 9999 (rather than 0 to 999), quick sort is faster, but still slower than merge sort. If I increase n to above 1 million, quicksort becomes slower than mergesort again.

      Any insight? (TA's my code is in the repo).

      Delete
    7. I wish we could edit posts...

      Revised second paragraph:

      I just now noticed that if I increase the range of my input data to integers between 0 and 9999 (rather than 0 to 999), quick sort is faster than merge sort. If I increase n to above 1 million, quicksort becomes slower than mergesort again.

      Delete
    8. This sounds like correct behavior. QuickSort does at best n * logn operations. However, worst case scenario it does n^2. So, if you have many elements that are the same value, it will not do ver well. In your case, you have approximately 1000 of each of the elements in the list. Try removing the variable that is determining values between 0 and 9999 and let them be any int value. You will see a performance gain in QuickSort... it may not be faster than MergeSort but it should be faster than when you have a less "random" setup.

      Delete
    9. Thanks! I also further improved both merge and quick sort by using insertion sort for very small lists.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Max, The version that is currently in your repository is only using 100,000 elements and merge sort is taking over 3 seconds. When you bump it up to 1,000,000 it will take several minutes to finish.

      Delete
    3. Sorry about that, I just updated it to 1,000,000 elements. Here's a screenshot of it running from my computer on the CS server.
      http://i.imgur.com/rbVA0.png

      Delete
    4. @Joseph: You were right. I just figured out something was happening where lots of elements were getting deleted, so the end list that had been sorted would only be a few elements long. No wonder it was going so incredibly fast. Now it's taking about the time you said it would. Sorry for the mix up. I feel like an idiot now

      Delete
  6. I've run into an issue while multi-threading my code. I've cut my lists down to 100,000 elements at the moment for testing. My ThreadedMergeSorter seemed to be doing pretty well on the CS server, so I used the same idea behind it on my ThreadedQuickSorter, but it's not working very well. It will sort my integers in about 16 seconds compared to the 100 or 200 ms time for the regular QuickSorter, but once it gets to the characters, it just busts completely. I've been looking at it for a while and can't figure out what the problem is. If you want to change the number of threads each sorter is using, just go to the SorterTest and you can change the maxMST and maxQST variables that are right next to the listSize variable. Thanks

    ReplyDelete
    Replies
    1. Max, I have not looked at your code. But, you might want to look for the following:

      Are you partitioning correctly? Does each thread get the correct chunk to quicksort?

      Are you creating more threads than the machine has processors? This will actually slow it down. Try making just 2 threads and seeing how it does.

      Delete
  7. To optimize my MergeSort algorithm, I've been working on a version that doesn't create so many copies of the ArrayList from using the .remove() method. Now however, I get this error when I run my code - which means something is getting lost in translation.

    'Exception in thread "main" java.lang.RuntimeException: Sort failed on 9 > 5
    at edu.unm.cs.cs251spr12.batch.lab.Sorter.SorterTest.main(SorterTest.java:56)'

    I believe the problem has to do with my mergeSort() method as when I replace the method with my old version, the sort works fine - it just takes forever; I'm fairly confident the mergeSortedLists() method should be fine. I just committed the new version to my repository so any feedback would be greatly appreciated.
    Thanks

    ReplyDelete
    Replies
    1. Don't use .remove() on ArrayLists (see above posts). You can iterate through an array list and copy elements to another like this: listCopy.add( list.get(i) ); This takes O(n) time, since ArrayList.get(i) is constant time.

      If we do this to populate the left and right halves in merge sort, we have to .clear() the original list before we add the sorted halves back.

      Delete
    2. Thanks for your reply Rob. This is essentially the approach I'm taking right now, however I have not used .clear() before I merge the sorted halves back together. I'll give that a shot!

      Delete
    3. Got it working fine now. Thanks for the help, sir.

      Delete
  8. Are any TAs will be in the FEC computer lab tomorrow (8 May) around 5:00 PM for help?

    ReplyDelete
  9. I will be tutoring from 5 - 8pm tomorrow Tuesday May 8th. Also, I will be tutoring Wednesday from 3 to 6pm.

    ReplyDelete
  10. I am getting StackOverflow exceptions when my project runs both threaded and non-threaded quicksort with 1,000,000 elements of different object type. It seems that it was throwing this type of exception in the partition method. The method returns the index of the pivot for each recursion of quicksort. I was able to sort 1 million elements in merge sort without any problems. What suggestions should I approach to fix this problem?

    ReplyDelete
  11. The recursion in quicksort can go as much as N levels deep, if your pivot choices are worst-possible. Check to see if this is what's happening. To protect yourself against this, it's best to choose a random element for each pivot. Then you are *extremely* unlikely to have recursion depth more than 2*log(N).

    ReplyDelete
    Replies
    1. How would I know if the recursion depth is 2*log(n) instead of n*log(n)?

      Delete
    2. The easiest way to find out the recursion depth would be to increment a member variable at the start of each call to sort( ), print it, do the method, and decrement the variable just before returning.

      Brandon, the phrasing of your questions makes me think perhaps you aren't clear about what a stack overflow is. It means your method calls are getting nested too many levels deep. As such, it doesn't really matter what the last method that pushed you over the limit was--the problem is in the total number of method calls being done before returning.

      Delete
    3. I have finally fixed the problem with the Stack Overflow exception by implementing a simpler quick sort algorithm. I am able to sort a million elements with no exceptions thrown.

      However, I am noticing that two threaded sorts would not perform faster than the non-threaded sort. I am using two threads to sort half of the list for both the merge and quick sorts. I'm running out of ideas to make it go fast.

      Delete
  12. I'm having some problems with the multi-threaded versions of my code. The MergeSort gets trapped in a loop. I've been working at this all day and I can't get it figured out. Do any of the TAs have suggestions – I just uploaded my most recent files.
    Are any of the TAs going to be in the FEC computer lab tomorrow?

    ReplyDelete
    Replies
    1. @Brandon, I will email you some suggestions on your code.

      Delete
    2. Thank you.
      Here's my unm email, in case you don't have it: batch at unm dot edu

      Delete
  13. I'm confused about this line in the spec:

    "The sort method should modify the list it is passed, so that it contains the same elements but sorted into increasing order..."

    So does this mean that the quicksorter and mergesorter methods should have void return types? the way I have them implemented now is that they modify the data they're given, sort it and return ArrayList as the sorted list, and in the main method I have data = QuickSorter(data);. I can't figure out how I would get it to modify the original data.

    ReplyDelete
    Replies
    1. @Zymbaluk:

      The method is called sort( ). See page 1 of the spec for the signature.

      Rather than "data = QuickSorter(data)" your syntax should be more like
      Sorter qs = new QuickSorter( );
      qs.sort( data );

      I suggest you have a further look at the example code in SorterTest.java

      Delete