Bootstrap Method for Streaming Data
Perasut Rungcharassang
Stage 1 : Typical statistical methods work with static data sets. the static data set can be indicated as follows : the data set is unchanged (not depend on time), the size of the data set is fixed (can be stored), there is clearly distribution on that data set (such as normal distribution or uniformly distribution) and so on. The whole static data set will be calculated in order to obtain statistical values (mean, standard deviation, etc.). However, in the recent years, the format of the data set is changed. Many applications need to work with non-static data sets. This type of the non-static data set can be called as data stream or streaming data. Its property is opposite to the static data set.
Stage 2 : Efron (1979) introduced the bootstrap method which is a statistical tool for estimating statistical values. The Bootstrap method is a very simple method used to estimate the sampling distribution of a sample data. It generates many re-samples by sampling the original training data with replacement to represent the sampling distribution. The bootstrap method will be applied when we know little statistical information of the data set, there is only a small amount of the data set or standard methods cannot be applied. The bootstrap method is used to handle in several problems such as the signal processing (Zoubir & Boashash, 1998; Zoubir & Iskander, 2007), class imbalance problem (Thanathamathee & Lursinsap 2013), etc.
Stage 3 : The original bootstrap method needs to use the whole data set in order to generate many re-samples. However data set may be huge, it will take more time to calculate and use more storage to store. Since the data set interested in this paper is streaming data, it cannot be calculated by the original bootstrap method with the whole streaming data.
Stage 4&5 : The purpose of this paper is to improve the original bootstrap method in order to apply to classifying streaming data.
My comments on my friend's blogs :
#1
http://sornjarodoonsiri.blogspot.com/2015/01/introduction.html?showComment=1423053397044#c1028524126658686890
#2
http://suwatthikul.blogspot.com/2015/02/assignment2-writing-introduction.html?showComment=1423055970697#c6274032821322178279
The purpose of this paper is to improve the original bootstrap method and apply to classifying streaming data in order to blend capabilities to real-time analysis of the system, with the ability to take immediate process-based action on the discovered insights.
ReplyDelete(I am not sure about the benefit of bootstrap method, I think that after "in order to" should be the value to others)
Thank you so much. ^ ^
DeleteIn my opinion: stage 1 "This type of the non-static data set can be..." I will cut "This type of" and change into " The non-static data set can be...."
ReplyDeleteIn stage 4&5 "in order to apply to classifying" may be I will cut "apply to".
Thank you so much. ^ ^
Delete