Friday, March 30, 2012

How to get the Training Error from the Model?

Hi, everyone here.

I am trying to get the training error of the model processed which can reveal how much the model fits the cases. The training error can reveal how many cases (from training set) are classified correctly. The lower traing error is, the more the model fits the training set. (Maybe overfitted) But I found it hard to get. I saw the life chart in AS 2005 which I am not quite understand and don't know how to code it in my program.

Is there some way to getting traing error or predicting error?

I am now using this awful way to get the training error:

select t.*,CollegeTree.CollegePlans as pred

from collegetree
prediction join
openquery(DSource,'select * from CollegePlans') as t
on CollegeTree.StudentID = t.StudentID and
...
where t.CollegePlans = CollegeTree.CollegePlans;

and then use datareader.ItemCount to get the count of cases which classified correctly.

Keyword: train error,predict error, data mining, analysis service

Instead of the lift chart, you could use the classification matrix (it is next to the lift chart in BI dev studio).

The Classification Matrix gives you the exactly the number of correct vs. incorrect classifications

|||Thx, it is what I need. I am still wondering how to program with the classification matrix. Can I just get it from SQL statement or AMO? And that classification matrix need a prediction procedure that may take too long time. Could I get the training error just after the model trained?|||Currently, there is no way to get the training error just after the model trained.
The Classification Matrix is a stored procedure call. The syntax is a bit complex (an XML encoding of the queries used to compute the matrix) and the easiest way to get the statement is to
- launch SQL Server Profiler
- Connect to the Analysis Services server
- Launch the classification matrix in the tools
- look in the profiler for a statement beginning with CALL Microsoft.AnalysisServices.System.DataMining.GetConfusionMatricesUsingDatasource

Then, you can execute that statement as any other DMX statement|||

Thx for your advice, it's really helpful.

I'm now using

Code Snippet

select flattened topcount((select [SUPPORT] from NODE_DISTRIBUTION),[SUPPORT],1) from DModel.content where NODE_TYPE=4

to export the number of positive cases for every rule, i.e. the max [support] for the leaf node( NODE_TYPE=4),

and add them up.

Code Snippet

While DSReader.Read()
pCase = pCase + DSReader(0)
End While
DMTrainingError = pCase/TotalCasesCount

I hava checked the result with the BI Classification Matrix, which prooved my idea.

The rules after processing the model are stored in a table named <Model Name>.content, and the column of NODE_DISTRIBUTION is a nested table containing the cases distribution of each class, and the prediction follows the most probable class with the max support.

So pick up the max value of support in every NODE_DISTRIBUTION and add them up, then you will get the number of positive cases. The ratio of positive cases in all cases can be used as Training Error.

No comments:

Post a Comment