abacusai
Submodules
abacusai.algorithm
abacusai.api_key
abacusai.application_connector
abacusai.batch_prediction
abacusai.batch_prediction_version
abacusai.categorical_range_violation
abacusai.client
abacusai.code_source
abacusai.concatenation_config
abacusai.cryptography
abacusai.custom_train_function_info
abacusai.data_filter
abacusai.data_prep_logs
abacusai.database_connector
abacusai.dataset
abacusai.dataset_column
abacusai.dataset_version
abacusai.deployment
abacusai.deployment_auth_token
abacusai.document
abacusai.document_annotation
abacusai.document_store
abacusai.document_store_import
abacusai.drift_distribution
abacusai.drift_distributions
abacusai.feature
abacusai.feature_drift_summary
abacusai.feature_group
abacusai.feature_group_export
abacusai.feature_group_export_config
abacusai.feature_group_export_download_url
abacusai.feature_group_template
abacusai.feature_group_version
abacusai.feature_record
abacusai.file_connector
abacusai.file_connector_instructions
abacusai.file_connector_verification
abacusai.function_logs
abacusai.indexing_config
abacusai.model
abacusai.model_location
abacusai.model_metrics
abacusai.model_monitor
abacusai.model_monitor_summary
abacusai.model_monitor_version
abacusai.model_monitor_version_metric_data
abacusai.model_upload
abacusai.model_version
abacusai.modification_lock_info
abacusai.nested_feature
abacusai.null_violation
abacusai.organization_group
abacusai.point_in_time_feature
abacusai.point_in_time_group
abacusai.point_in_time_group_feature
abacusai.prediction_client
abacusai.prediction_dataset
abacusai.prediction_feature_group
abacusai.prediction_input
abacusai.prediction_metric
abacusai.prediction_metric_version
abacusai.problem_type
abacusai.project
abacusai.project_dataset
abacusai.project_validation
abacusai.range_violation
abacusai.refresh_pipeline_run
abacusai.refresh_policy
abacusai.refresh_schedule
abacusai.resolved_feature_group_template
abacusai.return_class
abacusai.schema
abacusai.streaming_auth_token
abacusai.streaming_connector
abacusai.training_config_options
abacusai.type_violation
abacusai.upload
abacusai.upload_part
abacusai.use_case
abacusai.use_case_requirements
abacusai.user
abacusai.user_exception
Package Contents
Classes
Abacus.AI API Client |
|
Options for configuring the ApiClient |
|
Abacus.AI Read Only API Client. Only contains GET methods |
|
Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods |
Attributes
- class abacusai.ApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Bases:
ReadOnlyClient
Abacus.AI API Client
- Parameters
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- create_dataset_from_pandas(feature_group_table_name, df, name=None)
[Deprecated] Creates a Dataset from a pandas dataframe
- create_dataset_version_from_pandas(table_name_or_id, df)
[Deprecated] Updates an existing dataset from a pandas dataframe
- create_feature_group_from_pandas_df(table_name, df)
Create a Feature Group from a local Pandas DataFrame.
- Parameters
table_name (str) – The table name to assign to the feature group created by this call
df (pandas.DataFrame) – The dataframe to upload
- Return type
- update_feature_group_from_pandas_df(table_name, df)
Updates a DATASET Feature Group from a local Pandas DataFrame.
- Parameters
table_name (str) – The table name to assign to the feature group created by this call
df (pandas.DataFrame) – The dataframe to upload
- Return type
- create_feature_group_from_spark_df(table_name, df)
Create a Feature Group from a local Spark DataFrame.
- Parameters
df (pyspark.sql.DataFrame) – The dataframe to upload
table_name (str) – The table name to assign to the feature group created by this call
- Return type
- update_feature_group_from_spark_df(table_name, df)
Create a Feature Group from a local Spark DataFrame.
- Parameters
df (pyspark.sql.DataFrame) – The dataframe to upload
table_name (str) – The table name to assign to the feature group created by this call
should_wait_for_upload (bool) – Wait for dataframe to upload before returning. Some FeatureGroup methods, like materialization, may not work until upload is complete.
timeout (int, optional) – If waiting for upload, time out after this limit.
- Return type
- create_spark_df_from_feature_group_version(session, feature_group_version)
Create a Spark Dataframe in the provided Spark Session context, for a materialized Abacus Feature Group Version.
- Parameters
session (pyspark.sql.SparkSession) – Spark session
feature_group_version (str) – Feature group version to load from
- Returns
pyspark.sql.DataFrame
- create_model_from_functions(project_id, train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False)
Creates a model from a python function
- Parameters
project_id (str) – The project to create the model in
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
training_input_tables (list) – The input table names of the feature groups to pass to the train function
cpu_size (str) – Size of the cpu for the training function
memory (int) – Memory (in GB) for the training function
training_config (dict) –
exclusive_run (bool) –
- create_feature_group_from_python_function(function, table_name, input_tables, cpu_size=None, memory=None)
Creates a feature group from a python function
- Parameters
function (callable) – The function callable for the feature group
table_name (str) – The table name to give the feature group
input_tables (list) – The input table names of the feature groups as input to the feature group function
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
- create_algorithm_from_function(name, problem_type, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=False, project_id=None)
Create a new algorithm, or update existing algorithm if the name already exists
- Parameters
name (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
problem_type (Enum string) – The type of the problem this algorithm will work on
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (string) – The train config parameter name in the train function
config_options (Dict) – Map dataset types and configs to train function parameter names
is_default_enabled (bool) – Whether train with the algorithm by default
project_id (Unique String Identifier) – The unique version ID of the project
- update_algorithm_from_function(algorithm, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=None)
Create a new algorithm, or update existing algorithm if the name already exists
- Parameters
algorithm (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
train_function (callable) – The training fucntion callable to serialize and upload
predict_function (callable) – The predict function callable to serialize and upload
predict_many_function (callable) – The predict many function callable to serialize and upload
initialize_function (callable) – The initialize function callable to serialize and upload
training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (string) – The train config parameter name in the train function
config_options (Dict) – Map dataset types and configs to train function parameter names
is_default_enabled (bool) – Whether train with the algorithm by default
- get_train_function_input(project_id, training_table_names=None, training_data_parameter_name_override=None, training_config_parameter_name_override=None, training_config=None)
Get the input data for the train function to test locally.
- Parameters
project_id (String) – The id of the project
training_table_names (List) – A list of feature group tables used for training
training_data_parameter_name_override (Dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name_override (String) – The train config parameter name in the train function
training_config (Dict) – A dictionary for training parameters for the algorithm
- train_model_with_algorithms(project_id, model_name, user_defined_algorithms, training_table_names, cpu_size='SMALL', memory=3, user_defined_algorithms_only=False, training_config=None, user_defined_algorithm_configs=None)
Train a model with provided user-defined algorithms.
- Parameters
project_id (String) – The id of the project
model_name (String) – The name of the model to train
user_defined_algorithms (List) – A list of user-defined algorithm names
training_table_names (List) – A list of feature group tables used for training
cpu_size (Enum) – How much cpu is needed for the user-defined algorithms during training
memory (Int) – How much memory in GB is needed for the user-defined algorithms during training
user_defined_algorithms_only (Boolean) – Whether only train with user-defined algorithms, or also include Abacus.AI algorithms
training_config (Dict) – A dictionary for model training parameters
user_defined_algorithm_configs (Dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json
- add_user_to_organization(email)
Invites a user to your organization. This method will send the specified email address an invitation link to join your organization.
- Parameters
email (str) – The email address to invite to your Organization.
- create_organization_group(group_name, permissions, default_group=False)
Creates a new Organization Group.
- Parameters
- Returns
Information about the created Organization Group
- Return type
- add_organization_group_permission(organization_group_id, permission)
Adds a permission to the specified Organization Group
- remove_organization_group_permission(organization_group_id, permission)
Removes a permission from the specified Organization Group
- delete_organization_group(organization_group_id)
Deletes the specified Organization Group from the organization.
- Parameters
organization_group_id (str) – The ID of the Organization Group
- add_user_to_organization_group(organization_group_id, email)
Adds a user to the specified Organization Group
- remove_user_from_organization_group(organization_group_id, email)
Removes a user from an Organization Group
- set_default_organization_group(organization_group_id)
Sets the default Organization Group that all new users that join an organization are automatically added to
- Parameters
organization_group_id (str) – The ID of the Organization Group
- delete_api_key(api_key_id)
Delete a specified API Key. You can use the “listApiKeys” method to find the list of all API Key IDs.
- Parameters
api_key_id (str) – The ID of the API key to delete.
- remove_user_from_organization(email)
Removes the specified user from the Organization. You can remove yourself, Otherwise you must be an Organization Administrator to use this method to remove other users from the organization.
- Parameters
email (str) – The email address of the user to remove from the Organization.
- create_project(name, use_case)
Creates a project with your specified project name and use case. Creating a project creates a container for all of the datasets and the models that are associated with a particular problem/project that you would like to work on. For example, if you want to create a model to detect fraud, you have to first create a project, upload datasets, create feature groups, and then create one or more models to get predictions for your use case.
- Parameters
name (str) – The project’s name
use_case (str) – The use case that the project solves. You can refer to our (guide on use cases)[https://api.abacus.ai/app/help/useCases] for further details of each use case. The following enums are currently available for you to choose from: LANGUAGE_DETECTION, NLP_SENTIMENT, NLP_QA, NLP_SEARCH, NLP_SENTENCE_BOUNDARY_DETECTION, NLP_CLASSIFICATION, NLP_SUMMARIZATION, NLP_DOCUMENT_VISUALIZATION, EMBEDDINGS_ONLY, MODEL_WITH_EMBEDDINGS, TORCH_MODEL_WITH_EMBEDDINGS, PYTHON_MODEL, NOTEBOOK_PYTHON_MODEL, DOCKER_MODEL, DOCKER_MODEL_WITH_EMBEDDINGS, CUSTOMER_CHURN, ENERGY, FINANCIAL_METRICS, CUMULATIVE_FORECASTING, FRAUD_ACCOUNT, FRAUD_THREAT, FRAUD_TRANSACTIONS, OPERATIONS_CLOUD, CLOUD_SPEND, TIMESERIES_ANOMALY_DETECTION, OPERATIONS_MAINTENANCE, OPERATIONS_INCIDENT, PERS_PROMOTIONS, PREDICTING, FEATURE_STORE, RETAIL, SALES_FORECASTING, SALES_SCORING, FEED_RECOMMEND, USER_RANKINGS, NAMED_ENTITY_RECOGNITION, USER_ITEM_AFFINITY, USER_RECOMMENDATIONS, USER_RELATED, VISION_SEGMENTATION, VISION, FEATURE_DRIFT, SCHEDULING, GENERIC_FORECASTING.
- Returns
This object represents the newly created project. For details refer to
- Return type
- rename_project(project_id, name)
This method renames a project after it is created.
- delete_project(project_id)
Deletes a specified project from your organization.
This method deletes the project, trained models and deployments in the specified project. The datasets attached to the specified project remain available for use with other projects in the organization.
This method will not delete a project that contains active deployments. Be sure to stop all active deployments before you use the delete option.
Note: All projects, models, and deployments cannot be recovered once they are deleted.
- Parameters
project_id (str) – The unique ID of the project to delete.
- add_feature_group_to_project(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)
Adds a feature group to a project,
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT
- remove_feature_group_from_project(feature_group_id, project_id)
Removes a feature group from a project.
- set_feature_group_type(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')
Update the feature group type in a project. The feature group must already be added to the project.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
project_id (str) – The unique ID associated with the project.
feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
- use_feature_group_for_training(feature_group_id, project_id, use_for_training=True)
Use the feature group for model training input
- Parameters
- set_feature_mapping(project_id, feature_group_id, feature_name, feature_mapping, nested_column_name=None)
Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.
- Parameters
project_id (str) – The unique ID associated with the project.
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
feature_mapping (str) – The mapping of the feature in the feature group.
nested_column_name (str) – The name of the nested column.
- Returns
A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.
- Return type
- set_column_data_type(project_id, dataset_id, column, data_type)
Set a dataset’s column type.
- Parameters
project_id (str) – The unique ID associated with the project.
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.
- Returns
A list of objects that describes the resulting dataset’s schema after the column’s dataType is set.
- Return type
- set_column_mapping(project_id, dataset_id, column, column_mapping)
Set a dataset’s column mapping. If the column mapping is single-use and already set in another column in this dataset, this call will first remove the other column’s mapping and move it to this column.
- Parameters
- Returns
A list of columns that describes the resulting dataset’s schema after the column’s columnMapping is set.
- Return type
- remove_column_mapping(project_id, dataset_id, column)
Removes a column mapping from a column in the dataset. Returns a list of all columns with their mappings once the change is made.
- Parameters
- Returns
A list of objects that describes the resulting dataset’s schema after the column’s columnMapping is set.
- Return type
- create_feature_group(table_name, sql, description=None)
Creates a new feature group from a SQL statement.
- Parameters
- Returns
The created feature group
- Return type
- create_feature_group_from_template(table_name, feature_group_template_id, template_bindings=None, should_attach_feature_group_to_template=True, description=None)
Creates a new feature group from a SQL statement.
- Parameters
table_name (str) – The unique name to be given to the feature group.
feature_group_template_id (str) – template_info.template_sqlThe unique ID associated with the template that will be used to create this feature group.
template_bindings (list) – Variable bindings that override the template’s variable values.
should_attach_feature_group_to_template (bool) – Set to False to create a feature group but not leave it attached the template that created it.
description (str) – A user-friendly description of this feature group.
- Returns
The created feature group
- Return type
- create_feature_group_from_function(table_name, function_source_code, function_name, input_feature_groups=[], description=None, cpu_size=None, memory=None, package_requirements=None)
Creates a new feature in a Feature Group from user provided code. Code language currently supported is Python.
If a list of input feature groups are supplied, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.
This method expects function_source_code to be a valid language source file which contains a function named `function_name. This function needs return a DataFrame when it is executed and this DataFrame will be used as the materialized version of this feature group table.
- Parameters
table_name (str) – The unique name to be given to the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
description (str) – The description for this feature group.
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The created feature group
- Return type
- create_feature_group_from_zip(table_name, function_name, module_name, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None)
Creates a new feature group from a ZIP file.
- Parameters
table_name (str) – The unique name to be given to the feature group.
function_name (str) – Name of the function found in the module that will be executed (on the optional inputs) to materialize this feature group.
module_name (str) – Path to the file with the feature group function.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
description (str) – The description about the feature group.
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The Upload to upload the zip file to
- Return type
- create_feature_group_from_git(application_connector_id, branch_name, table_name, function_name, module_name, python_root=None, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None)
Creates a new feature group from a ZIP file.
- Parameters
application_connector_id (str) – The unique ID associated with the git application connector.
branch_name (str) – Name of the branch in the git repository to be used for training.
table_name (str) – The unique name to be given to the feature group.
function_name (str) – Name of the function found in the module that will be executed (on the optional inputs) to materialize this feature group.
module_name (str) – Path to the file with the feature group function.
python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
description (str) – The description about the feature group.
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The created feature group
- Return type
- create_sampling_feature_group(feature_group_id, table_name, sampling_config, description=None)
Creates a new feature group defined as a sample of rows from another feature group.
For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).
- Parameters
feature_group_id (str) – The unique ID associated with the pre-existing feature group that will be sampled by this new feature group. I.e. the input for sampling.
table_name (str) – The unique name to be given to this sampling feature group.
sampling_config (dict) – JSON object (aka map) defining the sampling method and its parameters.
description (str) – A human-readable description of this feature group.
- Returns
The created feature group.
- Return type
- create_merge_feature_group(source_feature_group_id, table_name, merge_config, description=None)
Creates a new feature group defined as the union of other feature group versions.
- Parameters
source_feature_group_id (str) – ID corresponding to the dataset feature group that will have its versions merged into this feature group.
table_name (str) – The unique name to be given to this merge feature group.
merge_config (dict) – JSON object (aka map) defining the merging method and its parameters.
description (str) – A human-readable description of this feature group.
- Returns
The created feature group.
- Return type
- create_transform_feature_group(source_feature_group_id, table_name, transform_config, description=None)
Creates a new feature group defined as a pre-defined transform on another feature group.
- Parameters
source_feature_group_id (str) – ID corresponding to the feature group that will have the transformation applied.
table_name (str) – The unique name to be given to this transform feature group.
transform_config (dict) – JSON object (aka map) defining the transform and its parameters.
description (str) – A human-readable description of this feature group.
- Returns
The created feature group.
- Return type
- create_snapshot_feature_group(feature_group_version, table_name)
Creates a Snapshot Feature Group corresponding to a specific feature group version.
- Parameters
- Returns
Feature Group corresponding to the newly created Snapshot.
- Return type
- set_feature_group_sampling_config(feature_group_id, sampling_config)
Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.
Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.
- Parameters
- Returns
The updated feature group.
- Return type
- set_feature_group_merge_config(feature_group_id, merge_config)
Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.
- set_feature_group_transform_config(feature_group_id, transform_config)
Set a TransformFeatureGroup’s transform config to the values provided.
- set_feature_group_schema(feature_group_id, schema)
Creates a new schema and points the feature group to the new feature group schema id.
- create_feature(feature_group_id, name, select_expression)
Creates a new feature in a Feature Group from a SQL select statement
- Parameters
- Returns
A feature group object with the newly added feature.
- Return type
- add_feature_group_tag(feature_group_id, tag)
Adds a tag to the feature group
- remove_feature_group_tag(feature_group_id, tag)
Removes a tag from the feature group
- add_feature_tag(feature_group_id, feature, tag)
- remove_feature_tag(feature_group_id, feature, tag)
- create_nested_feature(feature_group_id, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)
Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature.
table_name (str) – The table name of the feature group to nest
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_nested_feature(feature_group_id, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)
Updates a previously existing nested feature in a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
nested_feature_name (str) – The name of the feature to be updated.
table_name (str) – The name of the table.
using_clause (str) – The SQL join column or logic to join the nested table with the parent
where_clause (str) – A SQL where statement to filter the nested rows
order_clause (str) – A SQL clause to order the nested rows
new_nested_feature_name (str) – New name for the nested feature.
- Returns
A feature group object with the updated nested feature.
- Return type
- delete_nested_feature(feature_group_id, nested_feature_name)
Delete a nested feature.
- Parameters
- Returns
A feature group object without the deleted nested feature.
- Return type
- create_point_in_time_feature(feature_group_id, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)
Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.
We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature to create
history_table_name (str) – The table name of the history table.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
A feature group object with the newly added nested feature.
- Return type
- update_point_in_time_feature(feature_group_id, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)
Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
new_feature_name (str) – New name for the point in time feature.
- Returns
A feature group object with the newly added nested feature.
- Return type
- create_point_in_time_group(feature_group_id, group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)
Create point in time group
- Parameters
feature_group_id (str) – The unique ID associated with the feature group to add the point in time group to.
group_name (str) – The name of the point in time group
window_key (str) – Name of feature to use for ordering the rows on the source table
aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys
lookback_window (float) – Number of seconds in the past from the current time for start of the window. If 0, the lookback will include all rows.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
The feature group after the point in time group has been created
- Return type
- update_point_in_time_group(feature_group_id, group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)
Update point in time group
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
group_name (str) – The name of the point in time group
window_key (str) – Name of feature which contains the timestamp value for the point in time feature
aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys
lookback_window (float) – Number of seconds in the past from the current time for start of the window.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns
The feature group after the update has been applied
- Return type
- delete_point_in_time_group(feature_group_id, group_name)
Delete point in time group
- Parameters
- Returns
The feature group after the point in time group has been deleted
- Return type
- create_point_in_time_group_feature(feature_group_id, group_name, name, expression)
Create point in time group feature
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
group_name (str) – The name of the point in time group
name (str) – The name of the feature to add to the point in time group
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
- Returns
The feature group after the update has been applied
- Return type
- update_point_in_time_group_feature(feature_group_id, group_name, name, expression)
Update a feature’s SQL expression in a point in time group
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
group_name (str) – The name of the point in time group
name (str) – The name of the feature to add to the point in time group
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
- Returns
The feature group after the update has been applied
- Return type
- set_feature_type(feature_group_id, feature, feature_type)
Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
feature (str) – The name of the feature.
feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.
- Returns
The feature group after the data_type is applied
- Return type
- invalidate_streaming_feature_group_data(feature_group_id, invalid_before_timestamp)
Invalidates all streaming data with timestamp before invalidBeforeTimestamp
- concatenate_feature_group_data(feature_group_id, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)
Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).
- Parameters
feature_group_id (str) – The destination feature group.
source_feature_group_id (str) – The feature group to concatenate with the destination feature group.
merge_type (str) – UNION or INTERSECTION
replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.
skip_materialize (bool) – If true, will not materialize the concatenated feature group
- remove_concatenation_config(feature_group_id)
Removes the concatenation config on a destination feature group.
- Parameters
feature_group_id (str) – Removes the concatenation configuration on a destination feature group
- set_feature_group_indexing_config(feature_group_id, primary_key=None, update_timestamp_key=None, lookup_keys=None)
Sets various attributes of the feature group used for deployment lookups and streaming updates.
- Parameters
feature_group_id (str) – The feature group
primary_key (str) – Name of feature which defines the primary key of the feature group.
update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.
lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.
- update_feature_group(feature_group_id, description=None)
Modifies an existing feature group
- Parameters
- Returns
The updated feature group object.
- Return type
- detach_feature_group_from_template(feature_group_id)
Update a feature group to detach it from a template.
Currently, this converts the feature group into a SQL feature group rather than a template feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
The updated feature group
- Return type
- update_feature_group_template_bindings(feature_group_id, template_bindings=None)
Update the feature group template bindings for a template feature group.
- Parameters
- Returns
The updated feature group
- Return type
- update_feature_group_sql_definition(feature_group_id, sql)
Updates the SQL statement for a feature group.
- Parameters
- Returns
The updated feature group
- Return type
- update_dataset_feature_group_feature_expression(feature_group_id, feature_expression)
Updates the SQL feature expression for a dataset feature group’s custom features
- Parameters
- Returns
The updated feature group
- Return type
- update_feature_group_function_definition(feature_group_id, function_source_code=None, function_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)
Updates the function definition for a feature group created using createFeatureGroupFromFunction
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The updated feature group
- Return type
- update_feature_group_zip(feature_group_id, function_name, module_name, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)
Updates the zip for a feature group created using createFeatureGroupFromZip
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
module_name (str) – Path to the file with the feature group function.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The Upload to upload the zip file to
- Return type
- update_feature_group_git(feature_group_id, application_connector_id=None, branch_name=None, python_root=None, function_name=None, module_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)
Updates a feature group created using createFeatureGroupFromGit
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
application_connector_id (str) – The unique ID associated with the git application connector.
branch_name (str) – Name of the branch in the git repository to be used for training.
python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.
function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.
module_name (str) – Path to the file with the feature group function.
input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The updated FeatureGroup
- Return type
- update_feature(feature_group_id, name, select_expression=None, new_name=None)
Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name tp update the feature.
- Parameters
- Returns
The updated feature group object.
- Return type
- export_feature_group_version_to_file_connector(feature_group_version, location, export_file_format, overwrite=False)
Export Feature group to File Connector.
- Parameters
- Returns
The FeatureGroupExport instance
- Return type
- export_feature_group_version_to_database_connector(feature_group_version, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)
Export Feature group to Database Connector.
- Parameters
feature_group_version (str) – The Feature Group instance id to export.
database_connector_id (str) – Database connector to export to.
object_name (str) – The database object to write to
write_mode (str) – Either INSERT or UPSERT
database_feature_mapping (dict) – A key/value pair JSON Object of “database connector column” -> “feature name” pairs.
id_column (str) – Required if mode is UPSERT. Indicates which database column should be used as the lookup key for UPSERT
additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting
- Returns
The FeatureGroupExport instance
- Return type
- export_feature_group_version_to_console(feature_group_version, export_file_format)
Export Feature group to console.
- Parameters
- Returns
The FeatureGroupExport instance
- Return type
- set_feature_group_modifier_lock(feature_group_id, locked=True)
To lock a feature group to prevent it from being modified.
- add_user_to_feature_group_modifiers(feature_group_id, email)
Adds user to a feature group.
- add_organization_group_to_feature_group_modifiers(feature_group_id, organization_group_id)
Add Organization to a feature group.
- remove_user_from_feature_group_modifiers(feature_group_id, email)
Removes user from a feature group.
- remove_organization_group_from_feature_group_modifiers(feature_group_id, organization_group_id)
Removes Organization from a feature group.
- delete_feature(feature_group_id, name)
Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.
- Parameters
- Returns
The updated feature group object.
- Return type
- delete_feature_group(feature_group_id)
Removes an existing feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- create_feature_group_version(feature_group_id, variable_bindings=None)
Creates a snapshot for a specified feature group.
- Parameters
- Returns
A feature group version.
- Return type
- create_feature_group_template(feature_group_id, name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)
Create a feature group template.
- Parameters
feature_group_id (str) – Identifier of the feature group this template was created from.
name (str) – The user-friendly of for this feature group template.
template_sql (str) – The template sql that will be resolved by applying values from the template variables to generate sql for a feature group.
template_variables (list) – The template variables for resolving the template.
description (str) – A description of this feature group template
template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.
should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.
- Returns
The created feature group template
- Return type
- delete_feature_group_template(feature_group_template_id)
Delete an existing feature group template.
- Parameters
feature_group_template_id (str) – The unique ID associated with the feature group template.
- update_feature_group_template(feature_group_template_id, template_sql=None, template_variables=None)
Update a feature group template.
- Parameters
- Returns
The updated feature group template.
- Return type
- preview_feature_group_template_resolution(feature_group_template_id=None, template_bindings=None, template_sql=None, template_variables=None, should_validate=True)
Resolve template sql using template variables and template bindings.
- Parameters
feature_group_template_id (str) – If specified, use this template, otherwise assume an empty template.
template_bindings (list) – Values that overide the template variable values specified by the template.
template_sql (str) – If specified, use this as the template sql instead of the feature group template’s sql.
template_variables (list) – Template variables to use. If a template is provided, this overrides the template’s template variables.
should_validate (bool) –
- Returns
None
- Return type
- upload_part(upload_id, part_number, part_data)
Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.
- Parameters
upload_id (str) – A unique identifier for this upload
part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.
part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.
- Returns
The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.
- Return type
- mark_upload_complete(upload_id)
Marks an upload process as complete.
- create_dataset_from_file_connector(name, table_name, location, file_format=None, refresh_schedule=None, csv_delimiter=None, filename_column=None, start_prefix=None, until_prefix=None, location_date_format=None, date_format_lookback_days=None, incremental=False)
Creates a dataset from a file located in a cloud storage, such as Amazon AWS S3, using the specified dataset name and location.
- Parameters
name (str) – The name for the dataset.
table_name (str) – Organization-unique table name or the name of the feature group table to create using the source table.
location (str) – The URI location format of the dataset source. The URI location format needs to be specified to match the location_date_format when location_date_format is specified. Ex. Location = s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/* when The URI location format needs to include both the start_prefix and until_prefix when both are specified. Ex. Location s3://bucket1/dir1/* includes both s3://bucket1/dir1/dir2/event_date=2021-08-02/* and s3://bucket1/dir1/dir2/event_date=2021-08-08/*
file_format (str) – The file format of the dataset.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.
filename_column (str) – Adds a new column to the dataset with the external URI path.
start_prefix (str) – The start prefix (inclusive) for a range based search on a cloud storage location URI.
until_prefix (str) – The end prefix (exclusive) for a range based search on a cloud storage location URI.
location_date_format (str) – The date format in which the data is partitioned in the cloud storage location. E.g., if the data is partitioned as s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/dir4/filename.parquet, then the location_date_format is YYYY-MM-DD This format needs to be consistent across all files within the specified location.
date_format_lookback_days (int) – The number of days to look back from the current day for import locations that are date partitioned. E.g., import date, 2021-06-04, with date_format_lookback_days = 3 will retrieve data for all the dates in the range [2021-06-02, 2021-06-04].
incremental (bool) – Signifies if the dataset is an incremental dataset.
- Returns
The dataset created.
- Return type
- create_dataset_version_from_file_connector(dataset_id, location=None, file_format=None, csv_delimiter=None)
Creates a new version of the specified dataset.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
location (str) – A new external URI to import the dataset from. If not specified, the last location will be used.
file_format (str) – The fileFormat to be used. If not specified, the service will try to detect the file format.
csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.
- Returns
The new Dataset Version created.
- Return type
- create_dataset_from_database_connector(name, table_name, database_connector_id, object_name=None, columns=None, query_arguments=None, refresh_schedule=None, sql_query=None, incremental=False, timestamp_column=None)
Creates a dataset from a Database Connector
- Parameters
name (str) – The name for the dataset to be attached.
table_name (str) – Organization-unique table name
database_connector_id (str) – The Database Connector to import the dataset from
object_name (str) – If applicable, the name/id of the object in the service to query.
columns (str) – The columns to query from the external service object.
query_arguments (str) – Additional query arguments to filter the data
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, timestampColumn, and queryArguments
incremental (bool) – Signifies if the dataset is an incremental dataset.
timestamp_column (str) – If dataset is incremental, this is the column name of the required column in the dataset. This column must contain timestamps in descending order which are used to determine the increments of the incremental dataset.
- Returns
The created dataset.
- Return type
- create_dataset_from_application_connector(name, table_name, application_connector_id, object_id=None, start_timestamp=None, end_timestamp=None, refresh_schedule=None)
Creates a dataset from an Application Connector
- Parameters
name (str) – The name for the dataset
table_name (str) – Organization-unique table name
application_connector_id (str) – The unique application connector to download data from
object_id (str) – If applicable, the id of the object in the service to query.
start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
- Returns
The created dataset.
- Return type
- create_dataset_version_from_database_connector(dataset_id, object_name=None, columns=None, query_arguments=None, sql_query=None)
Creates a new version of the specified dataset
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
object_name (str) – If applicable, the name/id of the object in the service to query. If not specified, the last name will be used.
columns (str) – The columns to query from the external service object. If not specified, the last columns will be used.
query_arguments (str) – Additional query arguments to filter the data. If not specified, the last arguments will be used.
sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, and queryArguments
- Returns
The new Dataset Version created.
- Return type
- create_dataset_version_from_application_connector(dataset_id, object_id=None, start_timestamp=None, end_timestamp=None)
Creates a new version of the specified dataset
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
object_id (str) – If applicable, the id of the object in the service to query. If not specified, the last name will be used.
start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.
end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.
- Returns
The new Dataset Version created.
- Return type
- create_dataset_from_upload(name, table_name, file_format=None, csv_delimiter=None)
Creates a dataset and return an upload Id that can be used to upload a file.
- Parameters
- Returns
A refernce to be used when uploading file parts.
- Return type
- create_dataset_version_from_upload(dataset_id, file_format=None)
Creates a new version of the specified dataset using a local file upload.
- create_streaming_dataset(name, table_name, project_id=None, dataset_type=None)
Creates a streaming dataset. Use a streaming dataset if your dataset is receiving information from multiple sources over an extended period of time.
- Parameters
name (str) – The name for the dataset.
table_name (str) – The feature group table name to create for this dataset
project_id (str) – The project to create the streaming dataset for.
dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.
- Returns
The streaming dataset created.
- Return type
- snapshot_streaming_data(dataset_id)
Snapshots the current data in the streaming dataset for training.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
- Returns
The new Dataset Version created.
- Return type
- set_dataset_column_data_type(dataset_id, column, data_type)
Set a column’s type in a specified dataset.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. INTEGER, FLOAT, STRING, DATE, DATETIME, BOOLEAN, LIST, STRUCT Refer to the (guide on data types)[https://api.abacus.ai/app/help/class/DataType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.
- Returns
The dataset and schema after the data_type has been set
- Return type
- create_dataset_from_streaming_connector(name, table_name, streaming_connector_id, streaming_args=None, refresh_schedule=None)
Creates a dataset from a Streaming Connector
- Parameters
name (str) – The name for the dataset to be attached.
table_name (str) – Organization-unique table name
streaming_connector_id (str) – The Streaming Connector to import the dataset from
streaming_args (dict) – Dict of arguments to read data from the streaming connector
refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
- Returns
The created dataset.
- Return type
- set_streaming_retention_policy(dataset_id, retention_hours=None, retention_row_count=None)
Sets the streaming retention policy
- rename_database_connector(database_connector_id, name)
Renames a Database Connector
- rename_application_connector(application_connector_id, name)
Renames an Application Connector
- verify_database_connector(database_connector_id)
Checks to see if Abacus.AI can access the database.
- Parameters
database_connector_id (str) – The unique identifier for the database connector.
- verify_file_connector(bucket)
Checks to see if Abacus.AI can access the bucket.
- Parameters
bucket (str) – The bucket to test.
- Returns
The Result of the verification.
- Return type
- delete_database_connector(database_connector_id)
Delete a database connector.
- Parameters
database_connector_id (str) – The unique identifier for the database connector.
- delete_application_connector(application_connector_id)
Delete a application connector.
- Parameters
application_connector_id (str) – The unique identifier for the application connector.
- delete_file_connector(bucket)
Removes a connected service from the specified organization.
- Parameters
bucket (str) – The fully qualified URI of the bucket to remove.
- verify_application_connector(application_connector_id)
Checks to see if Abacus.AI can access the Application.
- Parameters
application_connector_id (str) – The unique identifier for the application connector.
- set_azure_blob_connection_string(bucket, connection_string)
Authenticates specified Azure Blob Storage bucket using an authenticated Connection String.
- Parameters
- Returns
An object with the roleArn and verification status for the specified bucket.
- Return type
- verify_streaming_connector(streaming_connector_id)
Checks to see if Abacus.AI can access the streaming connector.
- Parameters
streaming_connector_id (str) – The unique identifier for the streaming connector.
- rename_streaming_connector(streaming_connector_id, name)
Renames a Streaming Connector
- delete_streaming_connector(streaming_connector_id)
Delete a streaming connector.
- Parameters
streaming_connector_id (str) – The unique identifier for the streaming connector.
- create_streaming_token()
Creates a streaming token for the specified project. Streaming tokens are used to authenticate requests to append data to streaming datasets.
- Returns
The streaming token.
- Return type
- delete_streaming_token(streaming_token)
Deletes the specified streaming token.
- Parameters
streaming_token (str) – The streaming token to delete.
- attach_dataset_to_project(dataset_id, project_id, dataset_type)
[DEPRECATED] Attaches the dataset to the project.
Use this method to attach a dataset that is already in the organization to another project. The dataset type is required to let the AI engine know what type of schema should be used.
- Parameters
dataset_id (str) – The dataset to attach.
project_id (str) – The project to attach the dataset to.
dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.
- Returns
An array of columns descriptions.
- Return type
- remove_dataset_from_project(dataset_id, project_id)
[DEPRECATED] Removes a dataset from a project.
- rename_dataset(dataset_id, name)
Rename a dataset.
- delete_dataset(dataset_id)
Deletes the specified dataset from the organization.
The dataset cannot be deleted if it is currently attached to a project.
- Parameters
dataset_id (str) – The dataset to delete.
- train_model(project_id, name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, cpu_size=None, memory=None)
Trains a model for the specified project.
Use this method to train a model in this project. This method supports user-specified training configurations defined in the getTrainingConfigOptions method.
- Parameters
project_id (str) – The unique ID associated with the project.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
training_config (dict) – The training config key/value pairs used to train this model.
feature_group_ids (list) – List of feature group ids provided by the user to train the model on.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.
custom_algorithms (list) – List of user-defined algorithms to train.
custom_algorithms_only (bool) – Whether only run custom algorithms.
custom_algorithm_configs (dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json
cpu_size (str) – Size of the cpu for the user-defined algorithms during train.
memory (int) – Memory (in GB) for the user-defined algorithms during train.
- Returns
The new model which is being trained.
- Return type
- create_model_from_python(project_id, function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None)
Initializes a new Model from user provided Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
project_id (str) – The unique ID associated with the project.
function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
training_config (dict) – Training configuration
exclusive_run (bool) – Decides if this model will be run exclusively OR along with other Abacus.ai algorithms
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The new model, which has not been trained.
- Return type
- create_model_from_zip(project_id, train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, name=None, cpu_size=None, memory=None, package_requirements=None)
Initializes a new Model from a user provided zip file containing Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
project_id (str) – The unique ID associated with the project.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
None
- Return type
- create_model_from_git(project_id, application_connector_id, branch_name, train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, python_root=None, name=None, cpu_size=None, memory=None, package_requirements=None)
Initializes a new Model from a user provided git repository containing Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
project_id (str) – The unique ID associated with the project.
application_connector_id (str) – The unique ID associated with the git application connector.
branch_name (str) – Name of the branch in the git repository to be used for training.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) –
python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
None
- Return type
- rename_model(model_id, name)
Renames a model
- update_python_model(model_id, function_source_code=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)
Updates an existing python Model using user provided Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
model_id (str) – The unique ID associated with the Python model to be changed.
function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed to run batch predictions through model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The updated model
- Return type
- update_python_model_zip(model_id, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)
Updates an existing python Model using a provided zip file. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
model_id (str) – The unique ID associated with the Python model to be changed.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The updated model
- Return type
- update_python_model_git(model_id, application_connector_id=None, branch_name=None, python_root=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None)
Updates an existing python Model using an existing git application connector. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
model_id (str) – The unique ID associated with the Python model to be changed.
application_connector_id (str) – The unique ID associated with the git application connector.
branch_name (str) – Name of the branch in the git repository to be used for training.
python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
- Returns
The updated model
- Return type
- set_model_training_config(model_id, training_config)
Edits the default model training config
- set_model_prediction_params(model_id, prediction_config)
Sets the model prediction config for the model
- retrain_model(model_id, deployment_ids=[], feature_group_ids=None, custom_algorithm_configs=None, cpu_size=None, memory=None)
Retrains the specified model. Gives you an option to choose the deployments you want the retraining to be deployed to.
- Parameters
model_id (str) – The model to retrain.
deployment_ids (list) – List of deployments to automatically deploy to.
feature_group_ids (list) – List of feature group ids provided by the user to train the model on.
custom_algorithm_configs (dict) – The user-defined training configs for each custom algorithm.
cpu_size (str) – Size of the cpu for the user-defined algorithms during train.
memory (int) – Memory (in GB) for the user-defined algorithms during train.
- Returns
The model that is being retrained.
- Return type
- delete_model(model_id)
Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.
- Parameters
model_id (str) – The ID of the model to delete.
- delete_model_version(model_version)
Deletes the specified model version. Model Versions which are currently used in deployments cannot be deleted.
- Parameters
model_version (str) – The ID of the model version to delete.
- export_model_artifact_as_feature_group(model_version, table_name, artifact_type)
Exports metric artifact data for a model as a feature group.
- Parameters
- Returns
The created feature group.
- Return type
- get_custom_train_function_info(project_id, feature_group_names_for_training=None, training_data_parameter_name_override=None)
Returns the information about how to call the custom train function.
- Parameters
- Returns
Information about how to call the customer provided train function.
- Return type
- create_model_monitor(project_id, training_feature_group_id, prediction_feature_group_id, name=None, refresh_schedule=None, target_value=None, feature_mappings=None, model_id=None, training_feature_mappings=None)
Runs a model monitor for the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
training_feature_group_id (str) – The unique ID of the training data feature group
prediction_feature_group_id (str) – The unique ID of the prediction data feature group
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor
target_value (str) – A target positive value for the label to compute bias for
feature_mappings (dict) – A json map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
model_id (str) – The Unique ID of the Model
training_feature_mappings (dict) – ” A json map to override features for training_fature_group, where keys are column names and the values are feature data use types.
- Returns
The new model monitor that was created.
- Return type
- rerun_model_monitor(model_monitor_id)
Reruns the specified model monitor.
- Parameters
model_monitor_id (str) – The model monitor to rerun.
- Returns
The model monitor that is being rerun.
- Return type
- rename_model_monitor(model_monitor_id, name)
Renames a model monitor
- delete_model_monitor(model_monitor_id)
Deletes the specified model monitor and all its versions.
- Parameters
model_monitor_id (str) – The ID of the model monitor to delete.
- delete_model_monitor_version(model_monitor_version)
Deletes the specified model monitor version.
- Parameters
model_monitor_version (str) – The ID of the model monitor version to delete.
- create_deployment(name=None, model_id=None, feature_group_id=None, project_id=None, description=None, calls_per_second=None, auto_deploy=True, start=True, enable_batch_streaming_updates=False)
Creates a deployment with the specified name and description for the specified model or feature group.
A Deployment makes the trained model or feature group available for prediction requests.
- Parameters
name (str) – The name of the deployment.
model_id (str) – The unique ID associated with the model.
feature_group_id (str) – The unique ID associated with a feature group.
project_id (str) – The unique ID associated with a project.
description (str) – The description for the deployment.
calls_per_second (int) – The number of calls per second the deployment could handle.
auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.
start (bool) –
enable_batch_streaming_updates (bool) – Flag to enable marking the feature group deployment to have a background process cache streamed in rows for quicker lookup
- Returns
The new model or feature group deployment.
- Return type
- create_deployment_token(project_id)
Creates a deployment token for the specified project.
Deployment tokens are used to authenticate requests to the prediction APIs and are scoped on the project level.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
The deployment token.
- Return type
- update_deployment(deployment_id, description=None)
Updates a deployment’s description.
- rename_deployment(deployment_id, name)
Updates a deployment’s name and/or description.
- set_auto_deployment(deployment_id, enable=None)
Enable/Disable auto deployment for the specified deployment.
When a model is scheduled to retrain, deployments with this enabled will be marked to automatically promote the new model version. After the newly trained model completes, a check on its metrics in comparison to the currently deployed model version will be performed. If the metrics are comparable or better, the newly trained model version is automatically promoted. If not, it will be marked as a failed model version promotion with an error indicating poor metrics performance.
- set_deployment_model_version(deployment_id, model_version)
Promotes a Model Version to be served in the Deployment
- set_deployment_feature_group_version(deployment_id, feature_group_version)
Promotes a Feature Group Version to be served in the Deployment
- start_deployment(deployment_id)
Restarts the specified deployment that was previously suspended.
- Parameters
deployment_id (str) – The unique ID associated with the deployment.
- stop_deployment(deployment_id)
Stops the specified deployment.
- Parameters
deployment_id (str) – The Deployment ID
- delete_deployment(deployment_id)
Deletes the specified deployment. The deployment’s models will not be affected. Note that the deployments are not recoverable after they are deleted.
- Parameters
deployment_id (str) – The ID of the deployment to delete.
- delete_deployment_token(deployment_token)
Deletes the specified deployment token.
- Parameters
deployment_token (str) – The deployment token to delete.
- set_deployment_feature_group_export_file_connector_output(deployment_id, file_format=None, output_location=None)
Sets the export output for the Feature Group Deployment to be a file connector.
- set_deployment_feature_group_export_database_connector_output(deployment_id, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)
Sets the export output for the Feature Group Deployment to be a Database connector.
- Parameters
deployment_id (str) – The deployment for which the export type is set
database_connector_id (str) – The database connector ID used
object_name (str) – The database connector’s object to write to
write_mode (str) – UPSERT or INSERT for writing to the database connector
database_feature_mapping (dict) – The column/feature pairs mapping the features to the database columns
id_column (str) – The id column to use as the upsert key
additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting
- remove_deployment_feature_group_export_output(deployment_id)
Removes the export type that is set for the Feature Group Deployment
- Parameters
deployment_id (str) – The deployment for which the export type is set
- create_refresh_policy(name, cron, refresh_type, project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], prediction_metric_ids=[])
Creates a refresh policy with a particular cron pattern and refresh type.
A refresh policy allows for the scheduling of a particular set of actions at regular intervals. This can be useful for periodically updated data which needs to be re-imported into the project for re-training.
- Parameters
name (str) – The name for the refresh policy
cron (str) – A cron-like string specifying the frequency of a refresh policy
refresh_type (str) – The Refresh Type is used to determine what is being refreshed, whether its a single dataset, or dataset and a model, or more.
project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created
dataset_ids (list) – Comma separated list of Dataset IDs
model_ids (list) – Comma separated list of Model IDs
deployment_ids (list) – Comma separated list of Deployment IDs
batch_prediction_ids (list) – Comma separated list of Batch Predictions
prediction_metric_ids (list) – Comma separated list of Prediction Metrics
- Returns
The refresh policy created
- Return type
- delete_refresh_policy(refresh_policy_id)
Delete a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- pause_refresh_policy(refresh_policy_id)
Pauses a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- resume_refresh_policy(refresh_policy_id)
Resumes a refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- run_refresh_policy(refresh_policy_id)
Force a run of the refresh policy.
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- update_refresh_policy(refresh_policy_id, name=None, cron=None)
Update the name or cron string of a refresh policy
- Parameters
- Returns
The updated refresh policy
- Return type
- lookup_features(deployment_token, deployment_id, query_data={})
Returns the feature group deployed in the feature store project.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict(deployment_token, deployment_id, query_data={})
Returns a prediction for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_multiple(deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_from_datasets(deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows
- Return type
Dict
- predict_lead(deployment_token, deployment_id, query_data)
Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).
- Return type
Dict
- predict_churn(deployment_token, deployment_id, query_data)
Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_takeover(deployment_token, deployment_id, query_data)
Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).
- Return type
Dict
- predict_fraud(deployment_token, deployment_id, query_data)
Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).
- Return type
Dict
- predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a classification prediction
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
threshold (float) – float value that is applied on the popular class label.
threshold_class (str) – label upon which the threshold is added (Binary labels only)
thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) –
- Return type
Dict
- predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a prediction from a classification or regression model. Optionally, includes explanations.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) –
- Return type
Dict
- get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)
Returns a list of anomalies from the training dataset
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.
histogram (bool) – If True, will return a histogram of the distribution of all points
- Return type
- is_anomaly(deployment_token, deployment_id, query_data=None)
Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type
Dict
- get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None)
Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.
future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.
num_predictions (int) – The number of timestamps to predict in the future.
prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).
- Return type
Dict
- get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)
Returns the k nearest neighbors for the provided embedding vector.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
vector (list) – Input vector to perform the k nearest neighbors with.
k (int) – Overrideable number of items to return
distance (str) – Specify the distance function to use when finding nearest neighbors
include_score (bool) – If True, will return the score alongside the resulting embedding value
- Return type
Dict
- get_multiple_k_nearest(deployment_token, deployment_id, queries)
Returns the k nearest neighbors for the queries provided
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters
- get_labels(deployment_token, deployment_id, query_data, threshold=None)
Returns a list of scored labels from
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
threshold (None) – Deprecated
- Return type
Dict
- get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)
Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
exclude_item_ids (list) – [DEPRECATED]
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
explore_fraction (float) – The fraction of recommendations that is to be new items.
- Return type
Dict
- get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])
Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
- get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])
Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
- Return type
Dict
- get_feature_group_rows(deployment_token, deployment_id, query_data)
- get_search_results(deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
- Return type
Dict
- get_sentiment(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_entailment(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_classification(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_summary(deployment_token, deployment_id, query_data)
Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.
- Return type
Dict
- predict_language(deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (str) – # TODO
- Return type
Dict
- get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)
Get all positive assignments that match a query.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – specifies the set of assignments being requested.
forced_assignments (dict) – set of assignments to force and resolve before returning query results.
- Return type
Dict
- check_constraints(deployment_token, deployment_id, query_data)
Check for any constraints violated by the overrides.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – assignment overrides to the solution.
- Return type
Dict
- create_prediction_metric(feature_group_id, prediction_metric_config, project_id=None)
Create a prediction metric job description for the given prediction and actual-labels data.
- Parameters
feature_group_id (str) – The feature group to use as input to the prediction metrics.
prediction_metric_config (dict) – Specification for prediction metric to run in this job.
project_id (str) – Project to use for the prediction metrics. Defaults to the project for the input feature_group, if the feature_group has exactly one project.
- Returns
The Prediction Metric job description.
- Return type
- describe_prediction_metric(prediction_metric_id, should_include_latest_version_description=True)
Describe a Prediction Metric.
- Parameters
- Returns
The prediction metric object.
- Return type
- delete_prediction_metric(prediction_metric_id)
Removes an existing PredictionMetric.
- Parameters
prediction_metric_id (str) – The unique ID associated with the prediction metric.
- run_prediction_metric(prediction_metric_id)
Creates a new prediction metrics job run for the given prediction metric job description, and starts that job.
Configures and starts the computations running to compute the prediciton metric.
- Parameters
prediction_metric_id (str) – The prediction metric job description to apply for configuring a prediction metric job.
- Returns
A prediction metric version. For more information, please refer to the details on the object (below).
- Return type
- delete_prediction_metric_version(prediction_metric_version)
Removes an existing prediction metric version.
- Parameters
prediction_metric_version (str) –
- list_prediction_metric_versions(prediction_metric_id, limit=100, start_after_id=None)
List the prediction metric versions for a prediction metric.
- Parameters
- Returns
The prediction metric instances for this prediction metric.
- Return type
- create_batch_prediction(deployment_id, table_name=None, name=None, global_prediction_args=None, explanations=False, output_format=None, output_location=None, database_connector_id=None, database_output_config=None, refresh_schedule=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)
Creates a batch prediction job description for the given deployment.
- Parameters
deployment_id (str) – The unique identifier to a deployment.
table_name (str) – If specified, the name of the feature group table to write the results of the batch prediction. Can only be specified iff outputLocation and databaseConnectorId are not specified. If tableName is specified, the outputType will be enforced as CSV
name (str) – The name of batch prediction job.
global_prediction_args (dict) – Argument(s) to pass on every prediction call.
explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON)
output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.
database_connector_id (str) – The unique identifier of an Database Connection to write predictions to. Cannot be specified in conjunction with outputLocation.
database_output_config (dict) – A key-value pair of columns/values to write to the database connector. Only available if databaseConnectorId is specified.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically run the batch prediction.
csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV
csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV
csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV
output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version
result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list
- Returns
The batch prediction description.
- Return type
- start_batch_prediction(batch_prediction_id)
Creates a new batch prediction version job for a given batch prediction job description
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction to create a new version of
- Returns
The batch prediction version started by this method call.
- Return type
- update_batch_prediction(batch_prediction_id, deployment_id=None, global_prediction_args=None, explanations=None, output_format=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)
Updates a batch prediction job description
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction.
deployment_id (str) – The unique identifier to a deployment.
global_prediction_args (dict) – Argument(s) to pass on every prediction call.
explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).
csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV
csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV
csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV
output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version
result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_file_connector_output(batch_prediction_id, output_format=None, output_location=None)
Updates the file connector output configuration of the batch prediction
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction.
output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).
output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_database_connector_output(batch_prediction_id, database_connector_id=None, database_output_config=None)
Updates the database connector output configuration of the batch prediction
- Parameters
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_feature_group_output(batch_prediction_id, table_name)
Creates a feature group and sets it to be the batch prediction output
- Parameters
- Returns
The batch prediction after the output has been applied
- Return type
- set_batch_prediction_output_to_console(batch_prediction_id)
Sets the batch prediction output to the console, clearing both the file connector and database connector config
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_dataset(batch_prediction_id, dataset_type, dataset_id=None)
[Deprecated] Sets the batch prediction input dataset. Only applicable for legacy dataset-based projects
- Parameters
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_feature_group(batch_prediction_id, feature_group_type, feature_group_id=None)
Sets the batch prediction input feature group.
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
feature_group_type (str) – The feature group type to set. The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.
feature_group_id (str) – The feature group to set as input to the batch prediction
- Returns
The batch prediction description.
- Return type
- set_batch_prediction_dataset_remap(batch_prediction_id, dataset_id_remap)
For the purpose of this batch prediction, will swap out datasets in the input feature groups
- Parameters
- Returns
Batch Prediction object
- Return type
- delete_batch_prediction(batch_prediction_id)
Deletes a batch prediction
- Parameters
batch_prediction_id (str) – The unique identifier of the batch prediction
- add_user_item_interaction(streaming_token, dataset_id, timestamp, user_id, item_id, event_type, additional_attributes)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The streaming dataset to record data to.
timestamp (int) – The unix timestamp of the event.
user_id (str) – The unique identifier for the user.
item_id (list) – The unique identifier for the items
event_type (str) – The event type.
additional_attributes (dict) – Attributes of the user interaction.
- upsert_user_attributes(streaming_token, dataset_id, user_id, user_attributes)
Adds a user attributes record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- upsert_item_attributes(streaming_token, dataset_id, item_id, item_attributes)
Adds an item attributes record (data row) to a streaming dataset.
Either the streaming dataset ID or the project ID is required.
- add_multiple_user_item_interactions(streaming_token, dataset_id, interactions)
Adds a user-item interaction record (data row) to a streaming dataset.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the dataset.
dataset_id (str) – The streaming dataset to record data to.
interactions (list) – List of interactions, each interaction of format {‘userId’: userId, ‘timestamp’: timestamp, ‘itemId’: itemId, ‘eventType’: eventType, ‘additionalAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}
- upsert_multiple_user_attributes(streaming_token, dataset_id, upserts)
Adds multiple user attributes records (data row) to a streaming dataset.
The streaming dataset ID is required.
- upsert_multiple_item_attributes(streaming_token, dataset_id, upserts)
Adds multiple item attributes records (data row) to a streaming dataset.
The streaming dataset ID is required.
- upsert_item_embeddings(streaming_token, model_id, item_id, vector, catalog_id=None)
Upserts an embedding vector for an item id for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to upsert item embeddings to.
item_id (str) – The item id for which its embeddings will be upserted.
vector (list) – The embedding vector.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- delete_item_embeddings(streaming_token, model_id, item_ids, catalog_id=None)
Deletes knn embeddings for a list of item ids for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to delete item embeddings from.
item_ids (list) – A list of item ids for which its embeddings will be deleted.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- upsert_multiple_item_embeddings(streaming_token, model_id, upserts, catalog_id=None)
Upserts a knn embedding for multiple item ids for a model_id.
- Parameters
streaming_token (str) – The streaming token for authenticating requests to the model.
model_id (str) – The model id to upsert item embeddings to.
upserts (list) – A list of {‘itemId’: …, ‘vector’: […]} dicts for each upsert.
catalog_id (str) – Optional name to specify which catalog in a model to update.
- upsert_data(feature_group_id, streaming_token, data)
Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.
- append_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- upsert_multiple_data(feature_group_id, streaming_token, data)
Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.
- append_multiple_data(feature_group_id, streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- create_algorithm(name, problem_type, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=False, project_id=None)
Creates a custome algorithm that’s re-usable for model training
- Parameters
name (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
problem_type (str) – The type of the problem this algorithm will work on
source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (str) – The train config parameter name in the train function
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
config_options (dict) – Map dataset types and configs to train function parameter names
is_default_enabled (bool) – Whether train with the algorithm by default
project_id (str) – The unique version ID of the project
- Returns
The new customer model can be used for training
- Return type
- delete_algorithm(algorithm)
Deletes the specified customer algorithm.
- Parameters
algorithm (str) – The name of the algorithm to delete.
- update_algorithm(algorithm, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=None)
Update custome algorithm for the given algorithm name. If source_code is provided, also need to provide all the function names in the source_code.
- Parameters
algorithm (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed
source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function
training_config_parameter_name (str) – The train config parameter name in the train function
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
config_options (dict) – Map dataset types and configs to train function parameter names
is_default_enabled (bool) – Whether train with the algorithm by default
- Returns
The new customer model can be used for training
- Return type
- exception abacusai.ApiException(message, http_status, exception=None)
Bases:
Exception
Default ApiException raised by APIs
- Parameters
- __str__()
Return str(self).
- class abacusai.ClientOptions(exception_on_404=True, server=DEFAULT_SERVER)
Options for configuring the ApiClient
- class abacusai.ReadOnlyClient(api_key=None, server=None, client_options=None, skip_version_check=False)
Bases:
BaseApiClient
Abacus.AI Read Only API Client. Only contains GET methods
- Parameters
api_key (str) – The api key to use as authentication to the server
server (str) – The base server url to use to send API requets to
client_options (ClientOptions) – Optional API client configurations
skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client
- list_api_keys()
Lists all of the user’s API keys the user’s organization.
- Returns
List of API Keys for this user.
- Return type
- list_organization_users()
Retrieves a list of all users in the organization.
This method will retrieve a list containing all the users in the organization. The list includes pending users who have been invited to the organization.
- Returns
Array of all of the users in the Organization
- Return type
- describe_user()
Get the current user’s information, such as their name, email, admin status, etc.
- Returns
Information about the current User
- Return type
- list_organization_groups()
Lists all Organizations Groups within this Organization
- Returns
List of Groups in this Organization
- Return type
- describe_organization_group(organization_group_id)
Returns the specific organization group passes in by the user.
- Parameters
organization_group_id (str) – The unique ID of the organization group to that needs to be described.
- Returns
Information about a specific Organization Group
- Return type
- list_use_cases()
Retrieves a list of all use cases with descriptions. Use the given mappings to specify a use case when needed.
- Returns
A list of UseCase objects describing all the use cases addressed by the platform. For details, please refer to
- Return type
- describe_use_case_requirements(use_case)
This API call returns the feature requirements for a specified use case
- Parameters
use_case (str) – This will contain the Enum String for the use case whose dataset requirements are needed.
- Returns
The feature requirements of the use case are returned. This includes all the feature group required for the use case along with their descriptions and feature mapping details.
- Return type
- describe_project(project_id)
Returns a description of a project.
- list_projects(limit=100, start_after_id=None)
Retrieves a list of all projects in the current organization.
- list_project_datasets(project_id)
Retrieves all dataset(s) attached to a specified project. This API returns all attributes of each dataset, such as its name, type, and ID.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array representing all of the datasets attached to the project.
- Return type
- get_schema(project_id, dataset_id)
[DEPRECATED] Returns a schema given a specific dataset in a project. The schema of the dataset consists of the columns in the dataset, the data type of the column, and the column’s column mapping.
- validate_project(project_id, feature_group_ids=None)
Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.
- Parameters
- Returns
The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.
- Return type
- get_feature_group_schema(feature_group_id, project_id=None)
Returns a schema given a specific FeatureGroup in a project.
- describe_feature_group(feature_group_id)
Describe a Feature Group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
The feature group object.
- Return type
- describe_feature_group_by_table_name(table_name)
Describe a Feature Group by the feature group’s table name
- Parameters
table_name (str) – The unique table name of the Feature Group to lookup
- Returns
The Feature Group
- Return type
- list_feature_groups(limit=100, start_after_id=None, feature_group_template_id=None, is_including_detached_from_template=False)
Enlist all the feature groups associated with a project. A user needs to specify the unique project ID to fetch all attached feature groups.
- Parameters
limit (int) – The the number of feature groups to be retrieved.
start_after_id (str) – An offset parameter to exclude all feature groups till a specified ID.
feature_group_template_id (str) – If specified, limit results to feature groups attached to this template id.
is_including_detached_from_template (bool) – When feature_group_template_id is specified, include feature groups that were detached from that template id.
- Returns
All the feature groups in the organization
- Return type
- list_project_feature_groups(project_id, filter_feature_group_use=None)
List all the feature groups associated with a project
- Parameters
project_id (str) – The unique ID associated with the project.
filter_feature_group_use (str) – The feature group use filter, when given as an argument, only allows feature groups in this project to be returned if they are of the given use. DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT, BATCH_PREDICTION_OUTPUT
- Returns
All the Feature Groups in the Organization
- Return type
- get_feature_group_version_export_download_url(feature_group_export_id)
Get a link to download the feature group version.
- Parameters
feature_group_export_id (str) – The Feature Group Export to get signed url for.
- Returns
The FeatureGroupExportDownloadUrl instance, which contains the download URL and expiration time.
- Return type
- describe_feature_group_export(feature_group_export_id)
A feature group export
- Parameters
feature_group_export_id (str) – The ID of the feature group export.
- Returns
The feature group export
- Return type
- list_feature_group_exports(feature_group_id)
Lists all of the feature group exports for a given feature group
- Parameters
feature_group_id (str) – The ID of the feature group
- Returns
The feature group exports
- Return type
- list_feature_group_modifiers(feature_group_id)
To list users who can modify a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- Returns
Modification lock status and groups and organizations added to the feature group.
- Return type
- get_materialization_logs(feature_group_version, stdout=False, stderr=False)
Returns logs for materialized feature group version.
- Parameters
- Returns
A function logs.
- Return type
- list_feature_group_versions(feature_group_id, limit=100, start_after_version=None)
Retrieves a list of all feature group versions for the specified feature group.
- Parameters
- Returns
An array of feature group version.
- Return type
- describe_feature_group_version(feature_group_version)
Get a specific feature group version.
- Parameters
feature_group_version (str) – The unique ID associated with the feature group version.
- Returns
A feature group version.
- Return type
- describe_feature_group_template(feature_group_template_id)
Describe a Feature Group Template.
- Parameters
feature_group_template_id (str) – The unique ID of a feature group template.
- Returns
The feature group template object.
- Return type
- list_feature_group_templates(limit=100, start_after_id=None, feature_group_id=None)
List feature group templates, optionally scoped by the feature group that created the templates.
- Parameters
- Returns
All the feature groups in the organization, optionally limited by the feature group that created the template(s).
- Return type
- list_project_feature_group_templates(project_id, limit=100, start_after_id=None)
List feature group templates for feature groups associated with the project.
- Parameters
project_id (str) – Limit to templates associated with this project, e.g. templates associated with feature groups in this project.
limit (int) – The maximum number of templates to be retrieved.
start_after_id (str) – An offset parameter to exclude all templates till the specified feature group template ID.
- Returns
All the feature groups in the organization, optionally limited by the feature group that created the template(s).
- Return type
- suggest_feature_group_template_for_feature_group(feature_group_id)
Suggest values for a feature gruop template, based on a feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group to use for suggesting values to use for the template.
- Returns
None
- Return type
- get_dataset_schema(dataset_id)
Retrieves the column schema of a dataset
- Parameters
dataset_id (str) – The Dataset schema to lookup.
- Returns
List of Column schema definitions
- Return type
- get_file_connector_instructions(bucket, write_permission=False)
Retrieves verification information to create a data connector to a cloud storage bucket.
- Parameters
- Returns
An object with full description of the cloud storage bucket authentication options and bucket policy. Returns an error message if the parameters are invalid.
- Return type
- list_database_connectors()
Retrieves a list of all of the database connectors along with all their attributes.
- Returns
The database Connector
- Return type
- list_file_connectors()
Retrieves a list of all connected services in the organization and their current verification status.
- Returns
An array of cloud storage buckets connected to the organization.
- Return type
- list_database_connector_objects(database_connector_id)
Lists querable objects in the database connector.
- get_database_connector_object_schema(database_connector_id, object_name=None)
Get the schema of an object in an database connector.
- list_application_connectors()
Retrieves a list of all of the application connectors along with all their attributes.
- Returns
The appplication Connector
- Return type
- list_application_connector_objects(application_connector_id)
Lists querable objects in the application connector.
- list_streaming_connectors()
Retrieves a list of all of the streaming connectors along with all their attributes.
- Returns
The streaming Connector
- Return type
- list_streaming_tokens()
Retrieves a list of all streaming tokens along with their attributes.
- Returns
An array of streaming tokens.
- Return type
- get_recent_feature_group_streamed_data(feature_group_id)
Returns recently streamed data to a streaming feature group.
- Parameters
feature_group_id (str) – The unique ID associated with the feature group.
- list_uploads()
Lists all ongoing uploads in the organization
- Returns
An array of uploads.
- Return type
- describe_upload(upload_id)
Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.
- list_datasets(limit=100, start_after_id=None, exclude_streaming=False)
Retrieves a list of all of the datasets in the organization.
- describe_dataset(dataset_id)
Retrieves a full description of the specified dataset, with attributes such as its ID, name, source type, etc.
- describe_dataset_version(dataset_version)
Retrieves a full description of the specified dataset version, with attributes such as its ID, name, source type, etc.
- Parameters
dataset_version (str) – The unique ID associated with the dataset version.
- Returns
The dataset version.
- Return type
- list_dataset_versions(dataset_id, limit=100, start_after_version=None)
Retrieves a list of all dataset versions for the specified dataset.
- Parameters
- Returns
A list of dataset versions.
- Return type
- get_training_config_options(project_id, feature_group_ids=None, for_retrain=False)
Retrieves the full description of the model training configuration options available for the specified project.
The configuration options available are determined by the use case associated with the specified project. Refer to the (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for more information on use cases and use case specific configuration options.
- Parameters
- Returns
An array of options that can be specified when training a model in this project.
- Return type
- list_models(project_id)
Retrieves the list of models in the specified project.
- describe_model(model_id)
Retrieves a full description of the specified model.
- get_model_metrics(model_id, model_version=None, baseline_metrics=False)
Retrieves a full list of the metrics for the specified model.
If only the model’s unique identifier (modelId) is specified, the latest trained version of model (modelVersion) is used.
- Parameters
- Returns
An object to show the model metrics and explanations for what each metric means.
- Return type
- list_model_versions(model_id, limit=100, start_after_version=None)
Retrieves a list of the version for a given model.
- Parameters
- Returns
An array of model versions.
- Return type
- describe_model_version(model_version)
Retrieves a full description of the specified model version
- Parameters
model_version (str) – The unique version ID of the model version
- Returns
A model version.
- Return type
- get_training_data_logs(model_version)
Retrieves the data preparation logs during model training.
- Parameters
model_version (str) – The unique version ID of the model version
- Returns
A list of logs.
- Return type
- set_default_model_algorithm(model_id=None, algorithm=None)
Sets the model’s algorithm to default for all new deployments
- get_training_logs(model_version, stdout=False, stderr=False)
Returns training logs for the model.
- Parameters
- Returns
A function logs.
- Return type
- list_model_monitors(project_id)
Retrieves the list of models monitors in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of model monitors.
- Return type
- describe_model_monitor(model_monitor_id)
Retrieves a full description of the specified model monitor.
- Parameters
model_monitor_id (str) – The unique ID associated with the model monitor.
- Returns
The description of the model monitor.
- Return type
- get_prediction_drift(model_monitor_version)
Gets the label and prediction drifts for a model monitor.
- Parameters
model_monitor_version (str) – The unique identifier to a model monitor version created under the project.
- Returns
An object describing training and prediction output label and prediction distributions.
- Return type
- get_model_monitor_summary(model_monitor_id)
- Parameters
model_monitor_id (str) –
- Returns
None
- Return type
- list_model_monitor_versions(model_monitor_id, limit=100, start_after_version=None)
Retrieves a list of the versions for a given model monitor.
- Parameters
- Returns
An array of model monitor versions.
- Return type
- describe_model_monitor_version(model_monitor_version)
Retrieves a full description of the specified model monitor version
- Parameters
model_monitor_version (str) – The unique version ID of the model monitor version
- Returns
A model monitor version.
- Return type
- model_monitor_version_metric_data(model_monitor_version, metric_type)
Provides the data needed for decile metrics associated with the model monitor.
- Parameters
- Returns
Data associated with the metric.
- Return type
- get_model_monitoring_logs(model_monitor_version, stdout=False, stderr=False)
Returns monitoring logs for the model.
- Parameters
- Returns
A function logs.
- Return type
- get_drift_for_feature(model_monitor_version, feature_name)
Gets the feature drift associated with a single feature in an output feature group from a prediction.
- get_outliers_for_feature(model_monitor_version, feature_name=None)
Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.
- get_outliers_for_batch_prediction_feature(batch_prediction_version, feature_name=None)
Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.
- describe_deployment(deployment_id)
Retrieves a full description of the specified deployment.
- Parameters
deployment_id (str) – The unique ID associated with the deployment.
- Returns
The description of the deployment.
- Return type
- list_deployments(project_id)
Retrieves a list of all deployments in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployments.
- Return type
- list_deployment_tokens(project_id)
Retrieves a list of all deployment tokens in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployment tokens.
- Return type
- describe_refresh_policy(refresh_policy_id)
Retrieve a single refresh policy
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- Returns
A refresh policy object
- Return type
- describe_refresh_pipeline_run(refresh_pipeline_run_id)
Retrieve a single refresh pipeline run
- Parameters
refresh_pipeline_run_id (str) – The unique ID associated with this refresh pipeline_run
- Returns
A refresh pipeline run object
- Return type
- list_refresh_policies(project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], prediction_metric_ids=[])
List the refresh policies for the organization
- Parameters
project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created
dataset_ids (list) – Comma separated list of Dataset IDs
model_ids (list) – Comma separated list of Model IDs
deployment_ids (list) – Comma separated list of Deployment IDs
batch_prediction_ids (list) – Comma separated list of Batch Prediction IDs
model_monitor_ids (list) – Comma separated list of Model Monitor IDs.
prediction_metric_ids (list) – Comma separated list of Prediction Metric IDs,
- Returns
List of all refresh policies in the organization
- Return type
- list_refresh_pipeline_runs(refresh_policy_id)
List the the times that the refresh policy has been run
- Parameters
refresh_policy_id (str) – The unique ID associated with this refresh policy
- Returns
A list of refresh pipeline runs for the given refresh policy id
- Return type
- list_prediction_metrics(feature_group_id, limit=100, should_include_latest_version_description=True, start_after_id=None)
List the prediction metrics for a feature group.
- Parameters
feature_group_id (str) – The feature group used as input to this prediction metric.
limit (int) – The the number of prediction metrics to be retrieved.
should_include_latest_version_description (bool) – include the description of the latest prediction metric version for each prediction metric
start_after_id (str) – An offset parameter to exclude all prediction metrics till the specified prediction metric ID.
- Returns
The prediction metrics for this feature group.
- Return type
- query_prediction_metrics(feature_group_id=None, project_id=None, limit=100, should_include_latest_version_description=True, start_after_id=None)
Query and return prediction metrics and extra data needed by the UI, constrained by the parameters provided.
- feature_group_id (Unique String Identifier): [optional] The feature group used as input to the prediction metrics.
project_id (Unique String Identifier): [optional] The project_id of the prediction metrics. limit (Integer): The the number of prediction metrics to be retrieved. should_include_latest_version_description (Boolean): include the description of the latest prediction metric version for each prediction metric start_after_id (Unique String Identifier): An offset parameter to exclude all prediction metrics till the specified prediction metric ID.
- describe_prediction_metric_version(prediction_metric_version)
Retrieves a full description of the specified prediction metric version
- Parameters
prediction_metric_version (str) – The unique version ID of the prediction metric version
- Returns
A prediction metric version. For more information, please refer to the details on the object (below).
- Return type
- download_batch_prediction_result_chunk(batch_prediction_version, offset=0, chunk_size=10485760)
Returns a stream containing the batch prediction results
- Parameters
- Return type
- get_batch_prediction_connector_errors(batch_prediction_version)
Returns a stream containing the batch prediction database connection write errors, if any writes failed to the database connector
- Parameters
batch_prediction_version (str) – The unique identifier of the batch prediction job to get the errors for
- Return type
- list_batch_predictions(project_id)
Retrieves a list for the batch predictions in the project
- Parameters
project_id (str) – The unique identifier of the project
- Returns
A list of batch prediction jobs.
- Return type
- describe_batch_prediction(batch_prediction_id)
Describes the batch prediction
- Parameters
batch_prediction_id (str) – The unique ID associated with the batch prediction.
- Returns
The batch prediction description.
- Return type
- list_batch_prediction_versions(batch_prediction_id, limit=100, start_after_version=None)
Retrieves a list of versions of a given batch prediction
- Parameters
- Returns
A list of batch prediction versions.
- Return type
- class abacusai.PredictionClient(client_options=None)
Bases:
abacusai.client.BaseApiClient
Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods
- Parameters
client_options (ClientOptions) – Optional API client configurations
- predict_raw(deployment_token, deployment_id, **kwargs)
Raw interface for returning predictions from Plug and Play deployments.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
**kwargs (dict) – Arbitrary key/value pairs may be passed in and is sent as part of the request body.
- lookup_features(deployment_token, deployment_id, query_data={})
Returns the feature group deployed in the feature store project.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict(deployment_token, deployment_id, query_data={})
Returns a prediction for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_multiple(deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_from_datasets(deployment_token, deployment_id, query_data={})
Returns a list of predictions for Predictive Modeling
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows
- Return type
Dict
- predict_lead(deployment_token, deployment_id, query_data)
Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).
- Return type
Dict
- predict_churn(deployment_token, deployment_id, query_data)
Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
- Return type
Dict
- predict_takeover(deployment_token, deployment_id, query_data)
Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).
- Return type
Dict
- predict_fraud(deployment_token, deployment_id, query_data)
Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).
- Return type
Dict
- predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a classification prediction
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
threshold (float) – float value that is applied on the popular class label.
threshold_class (str) – label upon which the threshold is added (Binary labels only)
thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) –
- Return type
Dict
- predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)
Returns a prediction from a classification or regression model. Optionally, includes explanations.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.
explain_predictions (bool) – If true, returns the SHAP explanations for all input features.
fixed_features (list) – Set of input features to treat as constant for explanations.
nested (str) – If specified generates prediction delta for each index of the specified nested feature.
explainer_type (str) –
- Return type
Dict
- get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)
Returns a list of anomalies from the training dataset
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.
histogram (bool) – If True, will return a histogram of the distribution of all points
- Return type
- is_anomaly(deployment_token, deployment_id, query_data=None)
Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – The input data for the prediction.
- Return type
Dict
- get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None)
Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.
future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.
num_predictions (int) – The number of timestamps to predict in the future.
prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).
- Return type
Dict
- get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)
Returns the k nearest neighbors for the provided embedding vector.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
vector (list) – Input vector to perform the k nearest neighbors with.
k (int) – Overrideable number of items to return
distance (str) – Specify the distance function to use when finding nearest neighbors
include_score (bool) – If True, will return the score alongside the resulting embedding value
- Return type
Dict
- get_multiple_k_nearest(deployment_token, deployment_id, queries)
Returns the k nearest neighbors for the queries provided
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters
- get_labels(deployment_token, deployment_id, query_data, threshold=None)
Returns a list of scored labels from
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
threshold (None) – Deprecated
- Return type
Dict
- get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)
Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
exclude_item_ids (list) – [DEPRECATED]
score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
explore_fraction (float) – The fraction of recommendations that is to be new items.
- Return type
Dict
- get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])
Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
- get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])
Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.
preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.
preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
- Return type
Dict
Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.
num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.
page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.
scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.
restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.
exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.
- Return type
Dict
- get_feature_group_rows(deployment_token, deployment_id, query_data)
- get_search_results(deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.
- Return type
Dict
- get_sentiment(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_entailment(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_classification(deployment_token, deployment_id, document)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
document (str) – # TODO
- Return type
Dict
- get_summary(deployment_token, deployment_id, query_data)
Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.
- Return type
Dict
- predict_language(deployment_token, deployment_id, query_data)
TODO
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (str) – # TODO
- Return type
Dict
- get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)
Get all positive assignments that match a query.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – specifies the set of assignments being requested.
forced_assignments (dict) – set of assignments to force and resolve before returning query results.
- Return type
Dict
- check_constraints(deployment_token, deployment_id, query_data)
Check for any constraints violated by the overrides.
- Parameters
deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.
deployment_id (str) – The unique identifier to a deployment created under the project.
query_data (dict) – assignment overrides to the solution.
- Return type
Dict
- abacusai.__version__ = 0.36.17