Category Archives: Machine Learning

Pros and Cons of Automated Machine Learning

Machine Learning is nothing but one of the subdomains of science that deals with computers or applications that are not explicitly coded to perform the task. The combination of machine learning, cognitive technology, and AI will make a lot more smooth the processing of big chunks of data and information.

Machine Learning is an application of AI (Artificial Intelligence) which ables the machines or software to adapt, learn from itself, provided the data is resourceful and sensible. Simply saying the efforts are implying to develop expert systems.

Mainly we have three categories of machine learning: Supervised Learning, unsupervised learrning, reinforcement Learning.

Since it delivers at a faster rate with better and more accurate results, machine learning is brought into practice. The engineers work day and night to predict, classify, cluster the data. The player Machine Learning is sent on the pitch of data, and Big Data to handle the problems.

Continue reading

Swap Training and Test Data During Cross-Validation in scikit-learn

Last updated on October 11, 2018

Scikit-learn is a well known Python machine learning library. It provides various utilities for machine learning, including those for cross-validation. In a standard \(K\)-fold cross-validation, the data are split into \(K\) subsets (with equal size). There are \(K\) rounds of training and testing. In each round, one subset is used as test data and all other subsets are used as training data. Under this setup, as long as \(K > 2\), there are always more training data than test data in each round of the cross-validation. Whilst this is desirable in most cases, in some machine learning applications, it is more desirable to have training data less than test data. For example, in graph embedding, each node in the network has a vector representation and labels. When running cross-validation, it is more desirable to use a smaller number of nodes as training data than the number of nodes as test data, since this better mimics the real-world scenario in terms of the amount of available training data (e.g., here). In scikit-learn, we can achieve this by swapping training and test data.

Continue reading