Good morning everybody,
I'm here once again asking for help. Here's my problem:
I have a table that I use in my text mining clustering third-party algorithm. The table is as follows:
ID: primary key (int)
ID_DOC: an ID I use to know from what document a term came from (int)
ID_TERMO: the term ID (int)
TERMO: the term itself (varchar(max))
FREQ: the term frequency, usually TFxIDF (real)
Alright... I need the term itself to do some visualization. What I want to do is to get each case's term and store'em on a vector<char*>. So I did some changes on the ProcessCase method. In this method the data (each variable of the case) can be accessed rolling over the case using a DM_STATE_VALUE* as follows:
DM_STATE_VALUE* xpto = (DM_STATE_VALUE*) rgValue;
for(unsigned int i=0;i<_cAttribute;i++) { // _cAttribute is the number of variables
double aux = xpto.r8;
//Do some calculation using aux
}
Alright... It works quite well when I only have real and int data types on the database. But how the get the term collumn (varchar(max)) ?
How to modify the DM_STATE_VALUE union to accept strings as well ?
Any clues are wellcome.
Thanks a lot once more,
-Renan
DM_STATE_VALUE is used to represent the state of an attribute. The state is always a number (integer, representing the state index, for discrete attributes and double, representing the actual value, for continuous attributes). That being said, it is possible to retrieve the actual state value (whatever the original data type was) as well as the actual attribute name from within a plug-in.
I assume that your model looks (based on the data description above) more or less like below:
Model
(
ID LONG KEY,
Terms TABLE
(
TERMO TEXT KEY,
FREQ DOUBLE CONTINUOUS
)
)
In this case, each individual term generates one attribute: Terms(<term>).Freq. The existence of a key word in a document is implied by a non-missing value of Freq. (more details here: http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/1090.aspx)
Now, inside your plug-in algorithm, you can extract the short/long name of each attribute (therefore, the actual keyword) with a call to the Attribute Set object. The call would look like:
m_spAttributeSet->GetAttributeName(...)
It takes a string handle (DMStringPtr) and it populates it with the (long or short) name of the attribute. The short name will look like <Term>.Freq, the long name like Terms(<Term>).Freq
Also, a call to attribute set's GetAttributeValue will return the actual value behind a state index
A more detailed discussion on how to fetch attribute names and their values inside a plug-in is available here: http://www.sqlserverdatamining.com/DMCommunity/Newsgroup/266.aspx
|||Dear Mr. Crivat,
Your post was very helpful, as usual. But here're some more doubts on this matter:
So GetAttributeValue is the function I need to call. But it is as follows:
virtual HRESULT STDMETHODCALLTYPE GetAttributeValue(/* [in] */ IDMContextServices *in_Context,/* [in] */ DM_Attribute in_Attribute,/* [in] */ DMVariantPtr io_pvarValue) = 0;
I think in_Attribute is the variable index. That's fine...
What's a IDMContextServices and what do I need do create it correctly to pass to GetAttributeValue ? And where the string I'm looking for is being placed ? The variant pointer is marked as I/O. Will it point to the string ? If so, why is it a typedef of INT_PTR ?
Would you be so kind as to explaining this matter with the code I need to get the string field ?
I'm very pleased with your help again.
Thanks very much,
-Renan
EDITED: I don't know if I've made myself clear enough... I know that IDMContextServices is a kind of request interface and I need to manage it using a smart pointer. My doubt is how to instantiate it correclty with the information of the current case.
|||Hello, Renan
IDMContextServices should not be created. Such a pointer is received as an argument in all the plug-in algorithm methods. It serves (inside the server) to identify the context of the current call. The io_pVarValue parameter is a variant handle (that is, it represents a server variant object). On input, it contains the state index of the value that you want to receive (an attribute has multiple states; typically, for strings, these states are 0-Missing, 1, 2, 3 -- one for each distinct string state of the attribute)
A variant handle must be manipulated using the IDMVariantPtrHandler interface.
Therefore, assuming you want to call GetAttributeValue to get the actual string for a certain attribute inside an algorithm interface method (such as GetNodeProperty), here is what you should do:
// The algorithm method receives a ContextServices pointer as argument
STDMETHODIMP NAVIGATOR::GetNodeProperty(
/* [in] */ IDMContextServices* in_pContext,
/* [in] */ DM_NODE_PROP in_PropID,
/* [in] */ DMVariantPtr io_pValue)
...
CComPtr<IDMVariantHandler> spVarHandler; // object to manipulate variant handles
DMVariantPtr hAttValue; // Variant handle object
::VariantInit(&vtVal); // Create a new Windows variant object
V_VT(&vtVal) = VT_I4; // set the variant type to I4, integer, as it will hold a state index
V_I4(&vtVal) = (int)1; // set the variant value to the index of the state you want to retrieve
spVarHandler->CopyVariantToHandle( hAttValue, &vtVal); // copy the Windows variant into a server variant handle
// Now get the attribute value. Note that hAttValue contains, on input, the attribute state index and, on output, the attribute state actual value
//Also, note that the in_pContext parameter is the one received as argument for the current method, GetNodeProperty
m_pAlgo->m_spAttributeSet->GetAttributeValue(in_pContext, (DM_Attribute)Attribute, hAttValue);
spVarHandler->CopyHandleToVariant( hAttValue, &vtVal); // Copy the variant handle which now contains the state into a Windows variant
::VariantChangeType( &vtVal, &vtVal, 0, VT_BSTR) ; // Change the Windows variant's type to BSTR
wchar_t* pszAttVal = V_BSTR(&vtVal); // use the BSTR member of the Windows variant
::VariantClear(&vtVal); // Clear the windows variant
spVarHandler->DestroyVariantHandle(hAttValue); // Clear the server variant handle
The code above should use some error handling mechanism, each call may fail. I removed the error handling for clarity
I hope this clarifies the usage of GetAttributeValue. For a discrete/discretized attribute, io_pVarValue should contain the state index as I4 on Input and will contain the state value, using whatever is the actual data type, on output. For a continuous attribute, io_pVarValue should contain the actual value as R8 (double) on input and will contain the state value converted to the actual data type (i.e. Long, Double or DateTime) on output.
That being said, I may misunderstand your implementation but you may be on the wrong track. For a TEXT attribute, each individual string represents a possible string value. However, if you are using a nested table to represent the keywords for each document, then each individual keyword becomes an attribute having two states ("Missing" and "Existing"). If that is the case, then GetAttributeValue for an attribute that represents a keyword will return "Existing" and not the key word. Use GetAttributeName to get the actual keyword and pass "true" as value for the ShortName parameter. But, again, I might have misunderstood what you are trying to do
Regards,
No comments:
Post a Comment