youchoose.data package¶
Subpackages¶
- youchoose.data.example_datasets package
- youchoose.data.ingestion package
- Submodules
- youchoose.data.ingestion.csv module
- youchoose.data.ingestion.graph module
- youchoose.data.ingestion.helper_functions module
- youchoose.data.ingestion.image module
- youchoose.data.ingestion.nosql module
- youchoose.data.ingestion.sql module
- youchoose.data.ingestion.text module
- youchoose.data.ingestion.video module
- Module contents
Submodules¶
youchoose.data.data_loading module¶
youchoose.data.data_processing module¶
Data processing library.
-
youchoose.data.data_processing.dataframe_split(df: pandas.core.frame.DataFrame, train_frac: float = 0.8, test_frac: float = 0.1) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Split dataframe into training, testing, and validation sets.
Parameters: - df (pd.DataFrame) – A pandas dataframe with feature columns and examples as rows.
- train_frac (float, optional) – Fraction of the data to use for training. Defaults to 0.80.
- test_frac (float, optional) – Fraction of the data to use for testing. Defaults to 0.10.
Raises: ValueError– The testing and training fractions must both be less than 1 and their sum to be less than 1.Returns: - A tuple of the training,
validation, and testing dataframes.
Return type: Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]
-
youchoose.data.data_processing.item_sets(df: pandas.core.frame.DataFrame, users: str = 'user_id', items: str = 'item_id') → dict¶ Generate sets for each user containing items that they have previously interacted with.
Parameters: - df (pd.DataFrame) – Dataframe containing user_id and item_id columns.
- users (str, optional) – [description]. Defaults to “user_id”.
- items (str, optional) – [description]. Defaults to “item_id”.
Returns: - Dictionary of users and a set of items that they have previously
interacted with.
Return type: dict
-
youchoose.data.data_processing.list_to_indexed_dict(list_: list) → dict¶ Map a list of objects to a sorted dict of indexes.
Assign indexs to distinct objects in a list and return a dictionary with keys in range(num unique objects) maping to the object.
Parameters: list (list) – A list of objects to index. Returns: Sorted dictionary indexing unique objects in input list. Return type: dict
-
youchoose.data.data_processing.transform_data_ids(df: pandas.core.frame.DataFrame, user_col: str = 'user_id', item_col: str = 'item_id', weight_col: str = 'interaction', reweight: bool = True) → Tuple[pandas.core.frame.DataFrame, dict, dict]¶ Transform the item and user IDs into the indicies needed during embedding.
Parameters: - df (pd.DataFrame) – Dataframe containing user_id and item_id columns.
- user_col (str, optional) – Column name for the users. Defaults to “user_id”.
- item_col (str, optional) – Column name for the items/products. Defaults to “item_id”.
- weight_col (str, optional) – Column name for interaction metric. Defaults to “interaction”.
- reweight (bool, optional) – Transform the interactions to binary yes or no interactions. Defaults to True.
Returns: - The transformed dataframe along with the
lookup dicts used to translate between ID and index.
Return type: Tuple[pd.DataFrame, dict, dict]