Minor Project
My research question is whether the traditional
bootstrap method can be adapted to classify the streaming data. There are many
applications such as face detection that an incoming data is very large so it
may be impossible to store and to classify the whole data. Therefore several
methods are researched and developed to handle those problems.
Researchers who have looked at this
subject are Kleiner et al. and Wang et al. The former proposed
how to apply the bootstrap method to large-scale data and the latter adopted Kleiner’s
study to clustering problem.
Kleiner et al. (2012) proposed
the Bag of Little Bootstraps (BLB) which combined the original bootstrap method
with sub-sampling technique in order to reduce computation in the bootstrap
process. “BLB only requires repeated computation on small subsets of the
original dataset and avoids the bootstrap’s problematic need for repeated
computation of estimates on re-samples,” they said.
Wang et al. (2014) proposed the
Bag of Little Bootstraps Clustering (BLBC) which combined the clustering
results with Kleiner’s study. Their study is inspired by BLB technique. BLBC decreases
the total computation of clustering on a massive data (very large data).
Debate
centers on this issue showed that they can apply the bootstrap to the very
large dataset but the streaming data is not interested in both studies of them.
The massive data (or the big data) and the streaming data have a little
difference in a detail.
My
work will be closer to Wang’s because I would like to improve the original
bootstrap method in order to classify streaming data. I will use the BLBC's idea about how to insert clustering method into the bootstrap in order to lead to a new idea for my classification problem research.
Hopefully
my contribution will be to ensure my proposed method will can keep statistical
correctness and high accuracy in classification problem.
Reference
List (proceedings)
Kleiner,
A., Talwalkar, A., Sarkar, P., & Jordan, M.I. (2012). The big data
bootstrap. The Proceeding of the 29th
International Conference
on Machine Learning (pp. 1759-1766), Scotland: Omnipress.
Wang,
H., Zhuang, F., Ao, X., He, Q., & Shi, Z. (2014). Scalable bootstrap
clustering massive data. Software Engineering, Artificial
Intelligence,
Networking and Parallel/Distributed Computing (SNPD), 2014
15th
IEEE/ACIS International Conference on (pp.1-6), Las Vegas: CPS.
My comments on my friend's blogs :
#1 http://qquanta.blogspot.com/2015/02/minor-project-before-midterm.html?showComment=1424882247626#c3691086820542128134
#2http://kanokudomsit.blogspot.com/2015/02/minor-project.html?showComment=1424882848921#c7303694604604936908
My comments on my friend's blogs :
#1 http://qquanta.blogspot.com/2015/02/minor-project-before-midterm.html?showComment=1424882247626#c3691086820542128134
#2http://kanokudomsit.blogspot.com/2015/02/minor-project.html?showComment=1424882848921#c7303694604604936908