Wednesday, February 25, 2015

Minor Project


Minor Project

            My research question is whether the traditional bootstrap method can be adapted to classify the streaming data. There are many applications such as face detection that an incoming data is very large so it may be impossible to store and to classify the whole data. Therefore several methods are researched and developed to handle those problems.

            Researchers who have looked at this subject are Kleiner et al. and Wang et al. The former proposed how to apply the bootstrap method to large-scale data and the latter adopted Kleiner’s study to clustering problem.

         Kleiner et al. (2012) proposed the Bag of Little Bootstraps (BLB) which combined the original bootstrap method with sub-sampling technique in order to reduce computation in the bootstrap process. “BLB only requires repeated computation on small subsets of the original dataset and avoids the bootstrap’s problematic need for repeated computation of estimates on re-samples,” they said.

            Wang et al. (2014) proposed the Bag of Little Bootstraps Clustering (BLBC) which combined the clustering results with Kleiner’s study. Their study is inspired by BLB technique. BLBC decreases the total computation of clustering on a massive data (very large data).

Debate centers on this issue showed that they can apply the bootstrap to the very large dataset but the streaming data is not interested in both studies of them. The massive data (or the big data) and the streaming data have a little difference in a detail.

My work will be closer to Wang’s because I would like to improve the original bootstrap method in order to classify streaming data. I will use the BLBC's idea about how to insert clustering method into the bootstrap in order to lead to a new idea for my classification problem research.

Hopefully my contribution will be to ensure my proposed method will can keep statistical correctness and high accuracy in classification problem.

Reference List (proceedings)

Kleiner, A., Talwalkar, A., Sarkar, P., & Jordan, M.I. (2012). The big data 
          bootstrap. The Proceeding of the 29th International Conference 
           on Machine Learning (pp. 1759-1766), Scotland: Omnipress.

Wang, H., Zhuang, F., Ao, X., He, Q., & Shi, Z. (2014). Scalable bootstrap
        clustering massive data. Software Engineering, Artificial Intelligence,
        Networking and Parallel/Distributed Computing (SNPD), 2014 
       15th IEEE/ACIS International Conference on (pp.1-6), Las Vegas: CPS.


My comments on my friend's blogs : 

#1  http://qquanta.blogspot.com/2015/02/minor-project-before-midterm.html?showComment=1424882247626#c3691086820542128134

#2http://kanokudomsit.blogspot.com/2015/02/minor-project.html?showComment=1424882848921#c7303694604604936908

No comments:

Post a Comment