youchoose.data package¶

Subpackages¶

Submodules¶

youchoose.data.data_loading module¶

youchoose.data.data_processing module¶

Data processing library.

youchoose.data.data_processing.dataframe_split(df: pandas.core.frame.DataFrame, train_frac: float = 0.8, test_frac: float = 0.1) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶

Split dataframe into training, testing, and validation sets.

Parameters:	df (pd.DataFrame) – A pandas dataframe with feature columns and examples as rows. train_frac (float, optional) – Fraction of the data to use for training. Defaults to 0.80. test_frac (float, optional) – Fraction of the data to use for testing. Defaults to 0.10.
Raises:	`ValueError` – The testing and training fractions must both be less than 1 and their sum to be less than 1.
Returns:	A tuple of the training, validation, and testing dataframes.
Return type:	Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

youchoose.data.data_processing.item_sets(df: pandas.core.frame.DataFrame, users: str = 'user_id', items: str = 'item_id') → dict¶

Generate sets for each user containing items that they have previously interacted with.

Parameters:

df (pd.DataFrame) – Dataframe containing user_id and item_id columns.
users (str, optional) – [description]. Defaults to “user_id”.
items (str, optional) – [description]. Defaults to “item_id”.

Returns:

Dictionary of users and a set of items that they have previously: interacted with.

Return type:

dict

youchoose.data.data_processing.list_to_indexed_dict(list_: list) → dict¶

Map a list of objects to a sorted dict of indexes.

Assign indexs to distinct objects in a list and return a dictionary with keys in range(num unique objects) maping to the object.

Parameters:	list (list) – A list of objects to index.
Returns:	Sorted dictionary indexing unique objects in input list.
Return type:	dict

youchoose.data.data_processing.transform_data_ids(df: pandas.core.frame.DataFrame, user_col: str = 'user_id', item_col: str = 'item_id', weight_col: str = 'interaction', reweight: bool = True) → Tuple[pandas.core.frame.DataFrame, dict, dict]¶

Transform the item and user IDs into the indicies needed during embedding.