Science topic

# Machine Learning - Science topic

Explore the latest questions and answers in Machine Learning, and find Machine Learning experts.

Questions related to Machine Learning

HITON is an algorithm for markov blanket discovery of a target variable. The algorithm first appear in this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1480117/

I am looking for an R implementation of an algorithm which finds the markov blanket of a target variable.

Any reply would be greatly appreciated.

Thank you.

If I have 1000 instances of false class and 50 instances of true class, which practice is aimed for a better result? To select False and True on the basis of equal ratio or equal number?

I want to develop a prediction model (like time series forecasting) with BPNN. The sigmoid function is mostly used as activation functions in BPNN but the sigmoid function gives an output between 0 to 1. If my expected output is like 231.54 then how to calculate the error & go for training? In short, I want my network to produce values like 231.54. What should the activation function for the hidden and the output layer then be?

How do these three subspace methods perform with respect to classification?

We often use leave-one-subject-out cross-validation (LOSOXV) for machine learning experiments involving human subjects to allow for the subject-to-subject variation that occurs and also the tendency for autocorrelation for time series data involving a single subject. I am familiar with the paper by Kohavi recommending 10-fold XV but is there any equivalent paper that supports LOSOXV for this type of subject-dependent data?

Does anybody know a solver for a large scale sparse QP that works on the GPU?

Or, more in general, can a GPU speed up solvers for sparse QPs?

For example price is non-stationary transfer to rate of return that is stationary .

I am unable to code for Neural Networks as there is no support for coding. I want to code for prediction with Neural Networks. A simple example about coding will help to understand how to build our own network, how to train & test.

for e.g. the input values can take finite states only, say 10 symbols {A, B, C, D, E, F, G, H, I, J}

A large number of machine-learning models have been built to predict stock prices in literature. What are the main reasons why no one has achieved success so far?

My dataset has four categories with 1100 reviews in each category. I want to apply a cross-validation method for finding the optimal value of K for KNN (sentiment classification). How many folds will be required?

I'm looking for Evaluation methods for data stream.

I want Evaluated performance of Decision tree algorithm for Classification data stream.

I need an example of user defined callable function for weights in:

sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, warn_on_equidistant=True, p=2)¶

The opinion on this diverges in the literature. Some claim that AIC does require the true model to be in the set of candidate models, that is, in the case of linear regression, the true model is a subset of the candidate predictors. Some others say that AIC does not require the truth. As far as I understand the derivation, the truth is not required and, in fact, we aim at the KL-optimal model as proxy to truth and the nature of truth is not relevant to the derivation at any point.

I heard that the SVM classifier for binary classification needs a small number of positive examples compared to other classifiers. Does anyone know the reason why? And is there any other benefit of using SVM as classifier?

Using a 4 layer ANN with 10 input neurons, 5 hidden neurons in two hidden layers, and an output layer with 2 neurons, the network after trained generate almost identical outputs for different input patterns; by as little as0.0000001 difference. Does anyone know what might be the problem? And the possible fix?

Generally we deal with two types of forecasting; point and interval. With statistical method of forecasting (like ARIMA), we can perform interval prediction considering uncertainty of prediction. But what about doing the same with machine learning techniques like ANN and SVM?

In a research, I need to visualize each tree in random forest due to count the number of nodes included in each tree. I use R language to generate random forest but couldn't find any command to satisfy my demand. Do you have any idea how I can visualize each tree in random forest? or how I can calculate the number of nodes in each tree?

Named-entity recognition: Better results using Support Vector Machine (SVM) compared to Conditional Random Field (CRF)?

Does anyone know if quality of results for Named-entity recognition improves by using an implementation for Support Vector Machines instead of Conditional Random Fields?

I'm using the J48 decision tree algorithm with Weka, but it doesn't build a tree.

It only says Number of Leaves:1 and Size of the tree:1

I want to have information about the size of each tree in random forest (number of nodes) after training. I usually use WEKA but it seems it is unusable in this case.

I need to do clustering on a dataset composed of both numerical and categorical data. What algorithms do you recommend ?

I have to explain advantage and disadvantage of decision tree versus other classifier

Genetic information gathered from autistic patients is transformed to multidimentional data. this is huge and requires machine learning techniques to create an automated Autism detection system. I wonder if there are publications along this track.

Basic image processing in R

In Matlab, one can invoke commands such as "im2double" and "mat2gray" to convert a bitmap into a numerical matrix and back again to an image.
I was wondering whether this can be achieved in R, maybe via additional packages.

I need some real data for experiment.

I need my network to produce output like 234,231.......(values in range 200-300).

So I have created a 5-4-1 FF network with Sigmoid activation function in Hidden Layer and linear activation at output layer. Are these correct network activation functions for my network?

And since i am using Sigmoid function at hidden layer the output of hidden layer will be [0-1].

And calculation at hidden layer are summation of all Wts*inpus and then sigmoid function y=1/(1+e^-x). Is this correct what I am thinking?

How to calculate 10-fold cross validation paired t-test for classification data?

Scenario:

Suppose I have 5 subjects (S1, S2, ..., S5). Data for each subject is recorded in 4 different Sessions. which varies in size as well. If I construct a matrix for Subject 1 it looks like and here each row present 1 complete feature, but number of instances varies.

Sub1 Session 1 = [20*100]

Sub1 Session 2 = [10*100]

Sub1 Session 3 = [6*100]

Sub1 Session 4 = [13*100]

and variation goes on for other subjects

Question: Is it fine to use above data for SVM, or it should be as follows?

Sub1 Session 1 = [6*100]

Sub1 Session 2 = [6*100]

Sub1 Session 3 = [6*100]

Sub1 Session 4 = [6*100]

In negative selection algorithm how we should find proper parameters like affinity threshold?

I have three features of lengths 100, 25 and 128 for every subject. I want to use SVM. What is the best approach? Of course, scaling/normalization will be done.

Question 1 : Should I place these three features in one vector for every subject or is there any other appropriate way to deal it.

Question 2 : Is feature extraction an art and based on gut feelings more than engineering ?

I'm trying to collect information on this area, but most existing papers seem to target network-based DoS detection, whereas I could not find much about application-level approaches

I want to compute conditional cross entropy for a distribution like P(X1;Y|X2) where X1 and X2 are independent to each other.

P(X1,Y|X2) can be written: P(X1,X2,Y)/P(X2)=P(Y|X1,X2)*P(X1|X2)*P(X2)/P(X2)=P(Y|X1,X2)*P(X1). So, ultimately P(X1,Y|X2) = P(Y|X1,X2)*P(X1). Now what will the entropy of the distribution P(Y|X1,X2)*P(X1) be? How can it be implemented for some raw data in MATLAB?

Mahout is a solution of Apache Foundation to build scalable machine learning libraries.

Most literature uses processing of video/cam images. I would need a simpler solution, also need to avoid ethical issues due to videoing people.

I have 18 input features for a prediction network, so how many hidden layers should I take and what number of nodes are there in those hidden layers? Is there any formula for deciding this, or it is trial and error?

The best thesis I found in this field is "Efficient Boosted Ensemble-Based Machine Learning In The Context Of Cascaded Frameworks", written by Teo Susnjak. It is a very useful resource and also a new one, but I need more resources.

I am doing a project on prediction with neural networks. I have searched through a lot of material and found that most programmers prefer MATLAB or C++, JAVA or C# is very rarely used. Use of MATLAB is not allowed for me so I want to know what should I use, C++ or JAVA?

Is there any problem with JAVA or its libraries?

And finally what should I use, JAVA or C++? (I am more comfortable with JAVA but can also become familiar with C++).

By overlapping clustering I mean clustering where an object may belong to several clusters. By extrinsic evaluation I mean that I have the ground truth (a list of correct clusters) and I want to compare the output of my cluster algorithm against the ground truth. The clustering algorithm needs to determine the number of clusters, so the metrics should be robust against the number of clusters.

I've implemented a GMM for one class of the data. Then I performed ROC analysis and the diagram is not bad. I achieved 83% accuracy with a threshold but the problem is that the threshold is pretty small. It's 0.36*E-25. The probabilities of the test data are pretty small too. They are all near zero. Do you think it's unreasonable or is it an immense problem?

In a simple experiment, I have a dataset with 200 samples, each includes 5 attributes and a binary class label. 3 of these attributes are supposed to have equal measures for all samples in the dataset (we suppose there are 3 fixed numbers). Now I classified the dataset by random forest using R but the result is strange.

Having these 3 attributes in a dataset, classification is going wrong and all the samples take just 1 label (say label A) .

Eliminating these 3 fixed attributes, classification is correct with more than 96% accuracy.

The question is: what is the reason for this different result?

What is the effect of eliminating some fixed equal attributes on the classification problem?

What are the latest approaches for object detection and which (if applicable) are the machine learning algorithms used (i.e. SVM, Adaboost, Neural Networks)?

What are the methods or best predictive methods to use for this kind of data?

I have a text classification task. I used sliding windows method to pupolate my data set. The problem is that the size of the data set is huge and the data points are very similar in my data set. I would like to reduce the data set without losing informative data points. I am aware of variable selection techniques such as "kruskal.test", "limma", "rfe", "rf", "lasso", .... But how can I choose a suitable method for my problem without doing computationaly intensive operations.

I try to find out the baseline of a handwritten word. To understand what is a baseline, you can see in the link below, from Wikipedia. (It's the line in red in the picture). do you know any algorithms to do that?

I am using fuzzy rules to solve one of my reseach problem. Since rule are exhaustive. I need some fuzzy rule learning technique that can evolve the rules at its own. Which technique is best : genetic algorithm, neurofuzzy, or some other .

I know Multidimensional Scaling uses only a distance matrix, but Self-Organizing Map requires coordinates of points in the original space. What are some other dimensionality reduction techniques, such as Multidimensional Scaling, that need only a distance matrix rather than point coordinates?

Noam Chomsky:

Ref1: "Noam Chomsky on Where Artificial Intelligence Went Wrong"

Peter Norvig:

Ref2: "On Chomsky and the Two Cultures of Statistical Learning"

All the above have some common ground. In your opinion, what is the real difference between these fields? if you were asked what do you classify your research area what would that be?

What is the latest technology in advanced machine learning ( AML), beside Natural Language Processing and Neural Network?

On the face of it, topic modelling, whether it is achieved using LDA, HDP, NNMF, or any other method, is very appealing. Documents are partitioned into topics, which in turn have terms associated to varying degrees. However in practice, there are some clear issues: the models are very sensitive to the input data small changes to the stemming/tokenisation algorithms can result in completely different topics; topics need to be manually categorised in order to be useful (often arbitrarily, as the topics often contain mixed content); topics are "unstable" in the sense that adding new documents can cause significant changes to the topic distribution (less of an issue with large corpi).

In the light of these issues, are topic models simply a toy for NLP/ML researchers to show what can be done with data, or are they really useful in practice? Do any websites/products make use of them?

This is a toy example with four variables v,w,x,y and a binary outcome variable z. What approach would you take to find a statistical or algorithmic model to predict "z"? If you choose to perform the analysis, what is your cross-validated accuracy (10-fold)? What conclusions about the data-generating mechanism would you draw?

I tried increasing heap size by editing RunWeka.ini file and changed it to 1024m from 256m. I am using large dataset. After changing the file I saved it and ran WEKA again. But still I am getting the error. Is that the change was not taken or 1024m is not enough. Then how to increase further?

My claim is that none of the accepted concepts of class are *adequate*. Is this situation ok? Is this ok to do classification without an adequate (central) concept of class available in the model? Do you think this situation cannot be avoided?

CAUTION: This is a fundamental scientific question with radical applied implications.

is there any way a program can emit codes. Not by doing string concatenation. it should logically generate codes based on observation.

For doing cross validation is it the parameter x which has to be used? Or is it some other parameter? I am confused. For doing 10 cross validation -x 10 will be fine.

Also in SVM light how to generate ROC curve. How to change threshold values in SVM?

Because the present forms of data representation---including vector space, logical, graphs, etc.---have not been uniformly useful, does it mean that, in general, there will not be a single universally preferable form of data representation which would capture previously inaccessible and more complete view of data objects?

CAUTION: This is a serious scientific question with radical applied implications.

Neural Networks very loosely imitate and are inspired by the human brain and as such have some the strengths of the human brain like discrimination/pattern recognition and some of the weaknesses like forgetting and not being able to multiply 2n digits efficient

y. Neural Networks are used in Machine Learning a subset of AI and in AI itself in many different applications.

I'm researching different semantic distance measures for words and would like to have a test to compare those. I'm looking for the "TOEFL Synonym Questions" from the TOEFL test which others have used. It is linked here: http://www.aclweb.org/aclwiki/index.php?title=TOEFL_Synonym_Questions_(State_of_the_art) but the no one from the linked mail addresses reacted to my requests. Maybe someone here knows where to get the questions?

I'm also open to suggestions for other test sets to benchmark semantic distance measures for words.

SO If I use a NN with 5 layers - 3 hidden(h1,h2,h3), input and output to compress an image. Such that,

size of h1=h3 and h2 <h1 . How do i reconstruct the image from the output ?

Can anyone help in how to classify 20 visual concepts? Suppose we have visual concepts such as cars, cats, mountains, moon, night, I have a data set for these visual concepts that is based on tags and MPEG7. I trained the data set on a single concept (car) through the help of libsvm but the results were not satisfying.

The data set contains apx. 15000 images that belong to all these 20 concepts. I've put car as positive example and all other concepts as negative. Is this a problem of libsvm tool? I used linear classifier in libsvm.

Today's ITS research is dominated by highly ambitious Machine Learning Techniques. However, according to me, education as a process should be least mechanized. Thus I am working on the development of an ITS using production rules written in JESS.

My concern is whether rule based implementation will find the acceptance among the publication watchdogs or would they throw it to dustbin, being influenced by the overwhelming popularity of machine learning techniques?

If we want to define a size on a concept, what is the best criterion to do that?

Is it possible to use Greedy Set Covering Algorithm and suppose the minimum number of spheres covering positive instances as a Concept complexity measurement?

At the implementation stage of a machine learning model, is there rule of thumb to prefer MPI against GPGPU architecture as the architecture?

I was wondering whether there are competitions for Bayesian network structure learning algorithms, inside the machine learning community. I've read some papers on the topic, but it's hard to compare the efficiency of the different approaches, since they often use different metrics (AIC, BIC, MDL, loglikelihood, number of arcs that are not in the original model, ...) and some methods might be better with regard to certain parameters, and worse for others.

So, an "objective" (as much as possible) competition, evaluating the methods on a great number of different metrics, would be great to understand whether some approaches are actually better than others, of if they lay on the same "Pareto front".

Mainly how are training samples selected? Do we assign class labels to each pixel in an image?

What difference between active learning and passive learning ?

Is there any way to identify the best kernel "a priori"?

Looking for some implementation of compressive sensing.

Classical approaches perform badly on the minority class, even if the global learning is satisfactory.

I have 20 concepts, when I train (1 concept vs all) the problem that i face is that it takes a long time while training, and the accuracy that I found is about 50% when I test it.

So when training (1 vs 1) or (1vs k) the accuracy reaches to 85% . My question is " Which K concepts to select while I am training the classifier? " How can i find the nearest concept?

I read an article from Pedro Domingos titled "A few useful things to Know about machine Learning" (Oct 2012), and I do not buy into how he says that a 'dumb algorithm' and a tremendous amount of data will provide better results than moderate data and a more clever algorithm. What if adding more data gives you more noise or irrelevant information? The only thing I see as their justification is that you can go through more iterations from the data and that you have more ways to learn from it. I can't see that claim as sufficient or sound enough for this be valid.

What are you ideas about Danger Theory Algorithm? There are some algorithms to model and implement Artificial Immune System like Negative Selection and Clonal selection and recently Danger Theory. I want work on Dendritic Cell Algorithm (DCA), but it is a little ambiguous. Any recommendations?

Suppose we deal with a dataset with different kind of attributes (numeric and nominal) and binary class. How can we find a unique number as the shannon entropy of this dataset (as a presentation of kolmogorov complexity of this dataset)?

My svmlight output for binary classification (+1 for true and -1 for false classes) is like

+1 0.001

-1 -0.34

+1 0.9

+1 055

-1 -0.2

I tried generating roc in R using ROCR package. But I found in all examples of ROCR TPR ,FPR and cutoff values were used. TPR and FPR can be calculated from the above data but will get only one value and also w.r to cutoff I am confused. In this case cutoff is -1 to +1 right?

Can anyone help me how to give the values from the above data for drawing ROC in ROCR package.

Algorithms to deal with unbalanced clusters for classification?

Is cross validation necessary in neural network training and testing.

More accurately, I mean not the narrow technical discussions but the discussions on the general direction of the corresponding natural science. And this is particularly conspicuous given the transitional nature of our scientific period, or it may be exactly the symptom of this transition. Do you know the reasons? (Is this a symptom of unconscious insecurity?)

Just wanted to start exploring the possibilities of bioinformatics to be simulated using artificial intelligence ?

I'm fitting some multivariate observation sequences to a hidden markov model using R's RHmm pacakge. But I keep getting the following error "Error in BaumWelch(paramHMM, obs, paramAlgo) : Non inversible matrix". Does anyone know what happened? Or does anyone know where I can find other implementations?

I have installed R and ROC package in R. I need to generate ROC from my SVM output data . How to call my data in R to run ROCR?

Does anybody know how to train sequence using HMM using kevin murphy's toolbox? I want to train different sequences and want to test which sequence belongs to which class.

I am using newff of neurolab package of Python. The network is not learning for some set of inputs and show constant error through the learning phase. Here is the error values the network displays:

Epoch: 100; Error: 149.999993147;

Epoch: 200; Error: 149.999992663;

Epoch: 300; Error: 150.0;

Epoch: 400; Error: 150.0;

Epoch: 500; Error: 150.0;

Epoch: 600; Error: 150.0;

Epoch: 700; Error: 150.0;

Epoch: 800; Error: 150.0;

Epoch: 900; Error: 150.0;

Epoch: 1000; Error: 150.0;

Epoch: 1100; Error: 150.0;

Epoch: 1200; Error: 150.0;

Epoch: 1300; Error: 150.0;

Epoch: 1400; Error: 150.0;

Epoch: 1500; Error: 150.0;

Can anybody shed some light on the situation? What might be the possible reason for this kind of results?

I usually use Latent Dirichlet Allocation to cluster texts. What do you use? Can someone give a comparison between different text clustering algorithms?

In machine translation there are a number of phrases whose translations are quite fixed and do not depend on the surrounding words also. I have a list of such phrases and want Giza++ to take the advantage of this prior knowledge while making alignement of source and target language phrases. This should help Giza++ in reducing the effort of alignment process and also reduce the size of final phrase table. But how to do this ?

Machine learning is an essential part of many systems: communications, computer vision, AI, the web, etc. according to your field what do you think really the top killer application of ML?

Basically, do they provide features that are better than SIFT? I went through quite a few papers, everyone talks about translation, viewpoint , illumination invariance and so on. The topic of scale invariance doesn't seem to be clearly stated or at-least i dint find it.

According to my review of related literature, I think the sparse coding may be the basis of deep learning, i.e., the features in the lowest level of deep learning structures come from the dictionary of sparse coding. Is this statement right?

Anyone care to share on how to learn Matlab for feature selection? I have no prior background in using Matlab, so perhaps you could share a website where you could use the existing code that I could alter.

I want to use Matlab to analyze feature in Malware API calls.

Since random forest includes a bunch of random decision trees, it is not clear when we say forest size, it can be :

1) number of bits it takes

2) number of decision trees included in forest

3) accuracy

4) execution time

5) a manifold of all of above terms

6) etc.

What is the most important and what is the best?

I want to classify images based on SIFT features, so what tool can I use to extract them? I have used "http://koen.me/research/colordescriptors/" to extract the SIFT feature, but the problem that I am facing is: the size of the file after extracting the SIFT feature goes too large. I cannot pass that file to SVM as on file has apx. 12000 rows. The image that I am using has dimensions eg. 1024x683. SIFT feature file must contain less information so in this way I can pass hundreds of images to SVM at same time.

Will Rapidminer tool help me in drawing a ROC curve? Moreover, I tried installing it, but could not.

This is related to Machine Learning, digit/object recognition using KNN which is a supervised learning algorithm.based upon instance/lazy learning without using generalization and it is non-parametric in that it makes no assumptions of normalcy (Gaussian) distribution. I am trying to implement in: octave, python and java.

I have a binary classification problem with a data set that is missing a large amount of data. I have been reading papers about data imputation, but haven't reached a solid conclusion about which method is best. Note that I will most likely be using SVM for classification.