Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study
hal.structure.identifier | Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE] | |
dc.contributor.author | Hmida, Hmida | |
hal.structure.identifier | Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE] | |
dc.contributor.author | Ben Hamida, Sana
HAL ID: 177299 ORCID: 0000-0003-4202-613X | |
hal.structure.identifier | Laboratoire d'Informatique, Programmation, Algorithmique et Heuristique [LIPAH] | |
dc.contributor.author | Borgi, Amel | |
hal.structure.identifier | Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE] | |
dc.contributor.author | Rukoz, Marta | |
dc.date.accessioned | 2019-03-18T14:04:13Z | |
dc.date.available | 2019-03-18T14:04:13Z | |
dc.date.issued | 2017 | |
dc.identifier.uri | https://basepub.dauphine.fr/handle/123456789/18535 | |
dc.language.iso | en | en |
dc.subject | Sampling | en |
dc.subject | machine learning | en |
dc.subject | decision support systems | en |
dc.subject | Big data | en |
dc.subject.ddc | 006.3 | en |
dc.title | Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study | en |
dc.type | Communication / Conférence | |
dc.description.abstracten | The amount of available data for data mining, knowledge discovery continues to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper investigates the adaptation of Topology Based Selection (TBS) to face massive learning datasets by means of Hierarchical Sampling. We propose to combine the Random Subset Selection (RSS) with the TBS to create the RSS-TBS method. Two variants are implemented, applied to solve the KDD intrusion detection problem. They are compared to the original RSS, TBS techniques. The experimental results show that the important computational cost generated by original TBS when applied to large datasets can be lightened with the Hierarchical Sampling. | en |
dc.identifier.citationpages | 50-60 | en |
dc.relation.ispartoftitle | Advances in Big Data : Proceedings of the 2nd INNS Conference on Big Data, October 23-25, 2016, Thessaloniki, Greece | en |
dc.relation.ispartofeditor | Angelov, Plamen | |
dc.relation.ispartofeditor | Manolopoulos, Yannis | |
dc.relation.ispartofeditor | Iliadis, Lazaros | |
dc.relation.ispartofeditor | Roy, Asim | |
dc.relation.ispartofeditor | Vellasco, Marley | |
dc.relation.ispartofpublname | Springer International Publishing | en |
dc.relation.ispartofpublcity | Cham | en |
dc.relation.ispartofdate | 2017 | |
dc.relation.ispartofpages | 348 | en |
dc.subject.ddclabel | Intelligence artificielle | en |
dc.relation.ispartofisbn | 978-3-319-47897-5 | en |
dc.relation.conftitle | 2nd INNS Conference on Big Data | en |
dc.relation.confdate | 2016-10 | |
dc.relation.confcity | Thessaloniki | en |
dc.relation.confcountry | Greece | en |
dc.relation.forthcoming | non | en |
dc.identifier.doi | 10.1007/978-3-319-47898-2_6 | en |
dc.description.ssrncandidate | non | en |
dc.description.halcandidate | oui | en |
dc.description.readership | recherche | en |
dc.description.audience | International | en |
dc.relation.Isversionofjnlpeerreviewed | non | en |
dc.relation.Isversionofjnlpeerreviewed | non | en |
dc.date.updated | 2019-03-18T13:43:04Z | |
hal.faultCode | {"duplicate-entry":{"hal-01429907":{"doi":"1.0"}}} | |
hal.author.function | aut | |
hal.author.function | aut | |
hal.author.function | aut | |
hal.author.function | aut |
Files in this item
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |