youchoose.data package

Submodules

youchoose.data.data_loading module

youchoose.data.data_processing module

Data processing library.

youchoose.data.data_processing.dataframe_split(df: pandas.core.frame.DataFrame, train_frac: float = 0.8, test_frac: float = 0.1) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Split dataframe into training, testing, and validation sets.

Parameters:
  • df (pd.DataFrame) – A pandas dataframe with feature columns and examples as rows.
  • train_frac (float, optional) – Fraction of the data to use for training. Defaults to 0.80.
  • test_frac (float, optional) – Fraction of the data to use for testing. Defaults to 0.10.
Raises:

ValueError – The testing and training fractions must both be less than 1 and their sum to be less than 1.

Returns:

A tuple of the training,

validation, and testing dataframes.

Return type:

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

youchoose.data.data_processing.item_sets(df: pandas.core.frame.DataFrame, users: str = 'user_id', items: str = 'item_id') → dict

Generate sets for each user containing items that they have previously interacted with.

Parameters:
  • df (pd.DataFrame) – Dataframe containing user_id and item_id columns.
  • users (str, optional) – [description]. Defaults to “user_id”.
  • items (str, optional) – [description]. Defaults to “item_id”.
Returns:

Dictionary of users and a set of items that they have previously

interacted with.

Return type:

dict

youchoose.data.data_processing.list_to_indexed_dict(list_: list) → dict

Map a list of objects to a sorted dict of indexes.

Assign indexs to distinct objects in a list and return a dictionary with keys in range(num unique objects) maping to the object.

Parameters:list (list) – A list of objects to index.
Returns:Sorted dictionary indexing unique objects in input list.
Return type:dict
youchoose.data.data_processing.transform_data_ids(df: pandas.core.frame.DataFrame, user_col: str = 'user_id', item_col: str = 'item_id', weight_col: str = 'interaction', reweight: bool = True) → Tuple[pandas.core.frame.DataFrame, dict, dict]

Transform the item and user IDs into the indicies needed during embedding.

Parameters:
  • df (pd.DataFrame) – Dataframe containing user_id and item_id columns.
  • user_col (str, optional) – Column name for the users. Defaults to “user_id”.
  • item_col (str, optional) – Column name for the items/products. Defaults to “item_id”.
  • weight_col (str, optional) – Column name for interaction metric. Defaults to “interaction”.
  • reweight (bool, optional) – Transform the interactions to binary yes or no interactions. Defaults to True.
Returns:

The transformed dataframe along with the

lookup dicts used to translate between ID and index.

Return type:

Tuple[pd.DataFrame, dict, dict]

Module contents