Request Templates
This section introduces request templates and why they can be useful to detach algorithm configuration from actual clustering requests.
As mentioned in the introduction, a clustering query is self-contained and must carry the three required elements: algorithm name, language and documents to be clustered. In practice, the first two elements will rarely change – once the algorithm for a particular task is chosen, it will probably remain constant for all clustering requests. In situations like this we can pull out the constant part of the query into a template and use basic composition with the body of a query to get the final request.
Another useful application of templates is to detach the algorithm and its configuration from the logic that queries the clustering service or to group different algorithms and their tuning variants under much easier to remember aliases.
How to create a request template?
Let's explain it by walking through a simple example. For a request body like the one shown below, we can pull out everything up to the documents section:
{
"language": "English",
"algorithm": "Lingo",
"documents": [
{ "title": "PDF Viewer on Windows" },
{ "title": "Firefox PDF plugin to view PDF in browser on Windows" },
{ "title": "Limit CPU usage for flash in Firefox?" }
]
}
Let's create a file called 04 foo.json
under DCS-relative path:
web/service/templates/04 foo.json
. The name (identifier)
of the template is embedded in the file name: it is foo
. The
number in front of the alphanumeric name is always stripped (but it is used
to sort the templates for display purposes).
{
"language": "English",
"algorithm": "Lingo",
"parameters": {
"preprocessing": {
"phraseDfThreshold": 1,
"wordDfThreshold": 1
}
}
}
The foo
template contains constant (but overrideable) elements
of our clustering query. Once the DCS is started we can see if the template
has been loaded by looking at the startup log message or by
running a GET query against the /list
endpoint:
...
"templates" : [
"frontend-default",
"lingo",
"stc",
"bkmeans",
"foo"
]
}
Armed with the template foo
we can now assemble a much simpler
clustering request containing just the documents to be clustered:
{
"documents": [
{ "field": "foo bar" },
{ "field": "bar" },
{ "field": "baz" }
]
}
When posting the documents for clustering, use the
template
request parameter to provide the the template name
(foo
in our case):
curl -X POST --header "Content-Type: text/json" --data-binary @template-request.json "http://localhost:8080/service/cluster?indent&template=foo"