notice
This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).
rasa.nlu.featurizers.sparse_featurizer.lexical_syntactic_featurizer
LexicalSyntacticFeaturizer Objects
Extracts and encodes lexical syntactic features.
Given a sequence of tokens, this featurizer produces a sequence of features
where the t-th feature encodes lexical and syntactic information about the t-th
token and it's surrounding tokens.
In detail: The lexical syntactic features can be specified via a list of
configurations [c_0, c_1, ..., c_n] where each c_i is a list of names of
lexical and syntactic features (e.g. low, suffix2, digit).
For a given tokenized text, the featurizer will consider a window of size n
around each token and evaluate the given list of configurations as follows:
- It will extract the features listed in
c_mwherem = (n-1)/2if n is even andt0 from tokent - It will extract the features listed in
t2,t3 ... , from the last, second to last, ... token before tokent, respectively. - It will extract the features listed
t5,t5, ... for the first, second, ... tokent, respectively. It will then combine all these features into one feature for positiont.
Example:
If we specify t9, then for each position t
the t-th feature will encode whether the token at position t is upper case,
where the token at position [c_0, c_1, ..., c_n]3 is lower case and the first two characters
of the token at position [c_0, c_1, ..., c_n]4.
required_components
Components that should be included in the pipeline before this component.
get_default_config
Returns the component's default config.
__init__
Instantiates a new LexicalSyntacticFeaturizer instance.
validate_config
Validates that the component is configured properly.
train
Trains the featurizer.
Arguments:
training_data- the training data
Returns:
the resource from which this trained component can be loaded
warn_if_pos_features_cannot_be_computed
Warn if part-of-speech features are needed but not given.
process
Featurizes all given messages in-place.
Arguments:
messages- messages to be featurized.
Returns:
The same list with the same messages after featurization.
process_training_data
Processes the training examples in the given training data in-place.
Arguments:
training_data- the training data
Returns:
same training data after processing
create
Creates a new untrained component (see parent class for full docstring).
load
Loads trained component (see parent class for full docstring).
persist
Persist this model (see parent class for full docstring).
