Showing posts with label training. Show all posts
Showing posts with label training. Show all posts

Friday, March 30, 2012

How to get the Training Error from the Model?

Hi, everyone here.

I am trying to get the training error of the model processed which can reveal how much the model fits the cases. The training error can reveal how many cases (from training set) are classified correctly. The lower traing error is, the more the model fits the training set. (Maybe overfitted) But I found it hard to get. I saw the life chart in AS 2005 which I am not quite understand and don't know how to code it in my program.

Is there some way to getting traing error or predicting error?

I am now using this awful way to get the training error:

select t.*,CollegeTree.CollegePlans as pred

from collegetree
prediction join
openquery(DSource,'select * from CollegePlans') as t
on CollegeTree.StudentID = t.StudentID and
...
where t.CollegePlans = CollegeTree.CollegePlans;

and then use datareader.ItemCount to get the count of cases which classified correctly.

Keyword: train error,predict error, data mining, analysis service

Instead of the lift chart, you could use the classification matrix (it is next to the lift chart in BI dev studio).

The Classification Matrix gives you the exactly the number of correct vs. incorrect classifications

|||Thx, it is what I need. I am still wondering how to program with the classification matrix. Can I just get it from SQL statement or AMO? And that classification matrix need a prediction procedure that may take too long time. Could I get the training error just after the model trained?|||Currently, there is no way to get the training error just after the model trained.
The Classification Matrix is a stored procedure call. The syntax is a bit complex (an XML encoding of the queries used to compute the matrix) and the easiest way to get the statement is to
- launch SQL Server Profiler
- Connect to the Analysis Services server
- Launch the classification matrix in the tools
- look in the profiler for a statement beginning with CALL Microsoft.AnalysisServices.System.DataMining.GetConfusionMatricesUsingDatasource

Then, you can execute that statement as any other DMX statement|||

Thx for your advice, it's really helpful.

I'm now using

Code Snippet

select flattened topcount((select [SUPPORT] from NODE_DISTRIBUTION),[SUPPORT],1) from DModel.content where NODE_TYPE=4

to export the number of positive cases for every rule, i.e. the max [support] for the leaf node( NODE_TYPE=4),

and add them up.

Code Snippet

While DSReader.Read()
pCase = pCase + DSReader(0)
End While
DMTrainingError = pCase/TotalCasesCount

I hava checked the result with the BI Classification Matrix, which prooved my idea.

The rules after processing the model are stored in a table named <Model Name>.content, and the column of NODE_DISTRIBUTION is a nested table containing the cases distribution of each class, and the prediction follows the most probable class with the max support.

So pick up the max value of support in every NODE_DISTRIBUTION and add them up, then you will get the number of positive cases. The ratio of positive cases in all cases can be used as Training Error.

How to get the Training Error from the Model?

Hi, everyone here.

I am trying to get the training error of the model processed which can reveal how much the model fits the cases. The training error can reveal how many cases (from training set) are classified correctly. The lower traing error is, the more the model fits the training set. (Maybe overfitted) But I found it hard to get. I saw the life chart in AS 2005 which I am not quite understand and don't know how to code it in my program.

Is there some way to getting traing error or predicting error?

I am now using this awful way to get the training error:

select t.*,CollegeTree.CollegePlans as pred

from collegetree
prediction join
openquery(DSource,'select * from CollegePlans') as t
on CollegeTree.StudentID = t.StudentID and
...
where t.CollegePlans = CollegeTree.CollegePlans;

and then use datareader.ItemCount to get the count of cases which classified correctly.

Keyword: train error,predict error, data mining, analysis service

Instead of the lift chart, you could use the classification matrix (it is next to the lift chart in BI dev studio).

The Classification Matrix gives you the exactly the number of correct vs. incorrect classifications

|||Thx, it is what I need. I am still wondering how to program with the classification matrix. Can I just get it from SQL statement or AMO? And that classification matrix need a prediction procedure that may take too long time. Could I get the training error just after the model trained?|||Currently, there is no way to get the training error just after the model trained.
The Classification Matrix is a stored procedure call. The syntax is a bit complex (an XML encoding of the queries used to compute the matrix) and the easiest way to get the statement is to
- launch SQL Server Profiler
- Connect to the Analysis Services server
- Launch the classification matrix in the tools
- look in the profiler for a statement beginning with CALL Microsoft.AnalysisServices.System.DataMining.GetConfusionMatricesUsingDatasource

Then, you can execute that statement as any other DMX statement|||

Thx for your advice, it's really helpful.

I'm now using

Code Snippet

select flattened topcount((select [SUPPORT] from NODE_DISTRIBUTION),[SUPPORT],1) from DModel.content where NODE_TYPE=4

to export the number of positive cases for every rule, i.e. the max [support] for the leaf node( NODE_TYPE=4),

and add them up.

Code Snippet

While DSReader.Read()
pCase = pCase + DSReader(0)
End While
DMTrainingError = pCase/TotalCasesCount

I hava checked the result with the BI Classification Matrix, which prooved my idea.

The rules after processing the model are stored in a table named <Model Name>.content, and the column of NODE_DISTRIBUTION is a nested table containing the cases distribution of each class, and the prediction follows the most probable class with the max support.

So pick up the max value of support in every NODE_DISTRIBUTION and add them up, then you will get the number of positive cases. The ratio of positive cases in all cases can be used as Training Error.

sql

Friday, March 9, 2012

How to get predicion for monthly intervals?

hi,

I am using timeseries algorithm.I am training my model like the following

Date StudId Perf

5/1/2005 001 99

5/10/005 001 97.6

6/1/2005 001 94

6/10/2005 001 99

6/30/005 001 96

10/1/2005 001 100

Like that.

I need prediction Output like following

Date StudId Perf

10/1/2005 001 99

11/10/005 001 97.6

12/1/2005 001 94

1/10/2006 001 99

... how to write prediction query for this.

Thanks

Karthik.

have you tried a statement like below?

SELECT FLATTENED PredictTimeSeries(Perf) FROM Model?

I assume your model has Date as KEY TIME and StuId as KEY, right?

|||yes bodgan,you are right.|||So, does the query I suggested work correctly? Or are there different kind of results that you are looking for?|||

no bodgan,

the problem is the data.please look at the data, i am using to train and mt expected prediction result.I am expecting the prediction result so that the date values should be in a monthly interval.

Now I am getting results like this

Date StuId perf

10/1/2006 001 94

10/15/2006 001 99

11/10/2006 001 100

11/31/2006 001 75 Like that.

I need

Date StuId perf

10/1/2006 001 94

11/1/2006 001 99

12/1/2006 001 100

1/1/2007 001 75 Like that.

Thanks,

Karthik

|||

I see. With the specified input, you cannot get these results. The forecasting works by predicting the next N steps of the series. In order to get the results you want, your inputs should use 10/1/2006, 11/1/2006 and so on. A solution could be to average the values, per month, for each student ID. After that, as long as your input contains one entry (per student) per month, the prediction will result in one result per month per student