摘要
Mostoftheearlierworkonclusteringmainlyfocusedonnumericdatawhoseinherentgeometricpropertiescanbeexploitedtonaturallydefinedistancefunctionsbetweendatapoints.However,dataminingapplicationsfrequentlyinvolvemanydatasetsthatalsoconsistsofmixednumericandcategoricalattributes.Inthispaperwepresentaclusteringalgorithmwhichisbasedonthek-meansalgorithm.Thealgorithmclustersobjectswithnumericandcategoricalattributesinawaysimilartok-means.Theobjectsimilaritymeasureisderivedfrombothnumericandcategoricalattributes.Whenappliedtonumericdata,thealgorithmisidenticaltothek-means.Themainresultofthispaperistoprovideamethodtoupdatethe'clustercenters'ofclusteringobjectsdescribedbymixednumericandcategoricalattributesintheclusteringprocesstominimisetheclusteringcostfunction.Theclusteringperformanceofthealgorithmisdemonstratedwiththetwowellknowndatasets,namelycreditapprovalandabalonedatabases.
出版日期
2003年04月14日(中国期刊网平台首次上网日期,不代表论文的发表时间)