Some 13 years ago, Stanford statistician D. Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000. Indeed, unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web, and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions -- that is much larger than the available sample size. This renders classical data analysis methods inadequate, questionable, or inefficient at best, and calls for new approaches. Some of the manifestations of this curse of dimensionality are the following: - High dimensional geometry defeats our intuition rooted in low dimensional experiences so that data presentation and visualisation become particularly challenging. - Distance concentration is the phenomenon of high dimensional probability spaces where the contrast between pairwise distances vanishes as the dimensionality increases -- this makes distances meaningless, and affects all methods that rely on a notion of distance. - Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available. - The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data. - The computation cost of processing high dimensional data is often prohibiting.
12月07日
2013
会议日期
注册截止日期
2017年11月18日 美国
第五届ICDM高维数据挖掘研讨会2016年12月12日 西班牙 Barcelona,Spain
2016第4届ICDM高维数据挖掘研讨会2015年11月14日 美国
第三届国际高维数据挖掘研讨会
留言