Data Mining

50 selected papers in Data Mining and Machine Learning

Here is the list of 50 selected papers in Data Mining and Machine Learning. You can download them for your detailed reading and research. Enjoy!

General

Data Mining and Statistics: What’s the Connection?

Data Mining: Statistics and More?, D. Hand, American Statistician, 52(2):112-118.

Data Mining, G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.

From Data Mining to Knowledge Discovery in Databases, U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.

Mining Business Databases, Communications of the ACM, 39(11): 42-48.

10 Challenging Problems in Data Mining Research, Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604.

The Long Tail, by Anderson, C., Wired magazine.

AOL’s Disturbing Glimpse Into Users’ Lives, by McCullagh, D., News.com, August 9, 2006

General Data Mining Methods and Algorithms

Top 10 Algorithms in Data Mining, X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.

Induction of Decision Trees, R. Quinlan, Machine Learning, 1(1):81-106, 1986.

Web and Link Mining

The Pagerank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.

The Structure and Function of Complex Networks, M. E. J. Newman, SIAM Review, 2003, 45, 167-256.

Link Mining: A New Data Mining Challenge, L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.

Link Mining: A Survey, L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.

Semi-supervised Learning

Semi-Supervised Learning Literature Survey, X. Zhu, Computer Sciences TR 1530, University of Wisconsin — Madison.

Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham’s library has online access to the entire text)

Learning with Labeled and Unlabeled Data, M. Seeger, University of Edinburgh (unpublished), 2002.

Person Identification in Webcam Images: An Application of Semi-Supervised Learning, M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data, 2005.

Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains, N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research, 23:331-366, 2005.

Text Classification from Labeled and Unlabeled Documents using EM, K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning, 39, 103-134, 2000.

Self-taught Learning: Transfer Learning from Unlabeled Data, R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning, 2007.

An iterative algorithm for extending learners to a semisupervised setting, M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007

Partially-Supervised Learning / Learning with Uncertain Class Labels

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers, V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.

Logistic Regression for Partial Labels, in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Volume III, pp. 1935-1941, 2002.

Classification with Partial labels, N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.

Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).

Induction of Decision Trees from Partially Classified Data Using Belief Functions, M. Bjanger, Norweigen University of Science and Technology, 2000.

Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth, P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.

Recommender Systems

Trust No One: Evaluating Trust-based Filtering for Recommenders, J. O’Donovan and B. Smyth, In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), 2005, 1663-1665.

Trust in Recommender Systems, J. O’Donovan and B. Symyth, In Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 2005, 167-174.

General resources available on this topic:

ICML 2003 Workshop: Learning from Imbalanced Data Sets II

AAAI ‘2000 Workshop on Learning from Imbalanced Data Sets

Papers

A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, G. Batista, R. Prati, and M. Monard, SIGKDD Explorations, 6(1):20-29, 2004.

Class Imbalance versus Small Disjuncts, T. Jo and N. Japkowicz, SIGKDD Explorations, 6(1): 40-49, 2004.

Extreme Re-balancing for SVMs: a Case Study, B. Raskutti and A. Kowalczyk, SIGKDD Explorations, 6(1):60-69, 2004.

A Multiple Resampling Method for Learning from Imbalanced Data Sets, A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence, 20(1), 2004.

SMOTE: Synthetic Minority Over-sampling Technique, N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research, 16:321-357.

Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.

Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training, G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 161-172, 2004.

Issues in Mining Imbalanced Data Sets – A Review Paper, S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73, 2005.

Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets, N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining, 24-33, 2005.

C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling, C. Drummond and R. Holte, in ICML Workshop onLearning from Imbalanced Datasets II, 2003.

C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure, N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown, M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Uncertainty Sampling Methods for One-class Classifiers, P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Active Learning

Improving Generalization with Active Learning, D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.

On Active Learning for Data Acquisition, Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.

Active Sampling for Class Probability Estimation and Ranking, M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.

The Learning-Curve Sampling Method Applied to Model-Based Clustering, C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.

Active Sampling for Feature Selection, S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.

Heterogeneous Uncertainty Sampling for Supervised Learning, D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction, G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.

Active Learning using Adaptive Resampling, KDD 2000, 91-98.

Cost-Sensitive Learning

Types of Cost in Inductive Concept Learning, P. Turney, In Proceedings Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.

Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection, P. Chan and S. Stolfo, KDD 1998.

@SOURCE

11 Comments
  1. EJutidA 8 months ago
    Reply

    Elton John is my favourite singer of the world. I’m very happy to present for you this setlist 2019. Check eltonjohntour2019.com this website to get your best ticket for the farewell Elton John tour.

  2. EJutidA 7 months ago
    Reply

    Elton John is my favourite piano musician of the world. I’m encouraged to present for you this tour list 2019. Check Elton John tour Indianapolis link to get your ticket for the farewell Elton John tour.

  3. GeorgeutidA 7 months ago
    Reply

    Sir Elton John is my favourite musician of all time. Elton fans, let’s unite at https://www.facebook.com/Elton-John-Tour-Tickets-Concerts-219000645159063/! Full list of Elton John Concerts in the USA and Canada!

  4. BBoysea 6 months ago
    Reply

    Backstreet Boys BSB are an American rock boy band. The band was founded on April 20, 1993 in Orlando, Florida, by Lou Pearlman. Now this is the most successful boy band with more than 130 million records sold worldwide. The band was named after a flea market in Orlando, the “backstreet flea market”. In 2019 BB has more than 50 concerts in the US with their DNA US tour. Check concert dates at Backstreet Boys tour Indianapolis website. Full list of tour dates & concerts!

  5. nkotbea 6 months ago
    Reply

    New Kids on the Block is my favourite band of 90s. They had so many hits! The ones I remember are ‘Tonight’, ‘Baby, I Believe In You’ and their hit ‘Step By Step’. These are real masterpieces, not garbage like today! And it is awesome NKOTB have a tour in 2019! So I’m going to attend their concert this year. The tour dates is here: New Kids on the Block tour Sacramento. Open the page and maybe we can even visit one of the performances together!

  6. nkotbea 6 months ago
    Reply

    New Kids on the Block is my favourite pop-band of 90s. They had so many hit songs! The ones I remember are ‘Tonight’, ‘Baby, I Believe In You’ and their hit ‘Step By Step’. These are real masterpieces, not garbage like today! And it is sooo good NKOTB have a tour in 2019! So I’m going to attend New Kids on the Block concert this year. The tour dates is here: New Kids on the Block tour Winnipeg. Click on it and maybe we can even visit one of the performances together!

  7. BBoysea 6 months ago
    Reply

    Backstreet Boys BSB are an American boy band. The band was founded on April 20, 1993 in Orlando, Florida, by Lou Pearlman. Now this is the most successful band with more than 100 million records sold all around the world. The band was named after a flea market in Orlando, the “backstreet flea market”. In 2019 BB has more than 50 concerts in the US with their DNA US tour. Check all concerts at Backstreet Boys tour Toronto page. Full list of tour dates & concerts!

  8. JohnnyHit 4 months ago
    Reply

    I like blues songs! I really do! And my favourite blues band is Johnnyswim! The members Abner Ramirez and Amanda Sudano are about to give more than 40 concerts for their fans in 2019 and 2020! To know more about Johnnyswim band in 2019 and 2020 visit website Johnnyswim tour dates. You won’t miss any concert by Johnnyswim this year if you click on the link!

  9. ChainHit 4 months ago
    Reply

    I like EDM songs! I really do! And my favourite electronic band is The Cheinsmokers! DJs Andrew Taggart and Alex Pall are about to give more than 50 concerts to their fans in 2019 and 2020! To know more about Chainsmokers in 2020 visit site Chainsmokers tour Orlando. You aren’t going to miss any performance this year if you click on the link!

  10. ChainHit 4 months ago
    Reply

    I like EDM bands! I really do! And my favourite electronic band is The Cheinsmokers! DJs Andrew Taggart and Alex Pall are about to give more than 50 concerts for their fans in 2019 and 2020! To know more about The Chainsmokers in 2019 visit site Chainsmokers tour Las Vegas. You aren’t going to miss any performance in 2020 if you click on the link!

  11. Breakea 3 months ago
    Reply

    Breaking Benjamin is my favourite rock-band of 90s. They had so many hits! The ones I remember are ‘The Diary of Jane’, ‘Tourniquet’ and, of course their hit ‘So Cold’. These are real masterpieces, not fake ones like today! And it is awesome that they have a tour in 2019-2020! So I’m going to attend their concert this year. The concert setlist is here: https://breakingbenjaminconcerts.com. Check it out and maybe we can even visit one of the Benjamin’s together!

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest