youchoose.data.ingestion package¶
Submodules¶
youchoose.data.ingestion.csv module¶
Library for extracting data from csv files.
youchoose.data.ingestion.graph module¶
Library for reading and writting data in graph structures to flat files or graph databases (Neo4j).
youchoose.data.ingestion.helper_functions module¶
Helper functions used to connect to local and remote data sources.
-
youchoose.data.ingestion.helper_functions.get_env_parameters() → dict¶ Load in environment variables for database and ssh connections.
Returns: - A dictionary of the variables needed for ssh and database
- connections.
Return type: dict
-
youchoose.data.ingestion.helper_functions.ssh_tunnel(private_key)¶ Connect to a host computer through a ssh tunnel.
Use to connect to a database that is not accessible through a public ip address. To open the connections, the following variables need to either be exported as environment variables or located in a .env file in the project’s root directory.
HOST_IP (IP address) - IP for the host to tunnel to. SSH_PORT (int) - Open port on host to ssh through. HOST_USER (str) - Username on host computer. DB_HOST - Hostname/endpoint of the postgres database. DB_PORT (int) - Open port on database for connection. DB_USER - Username for the database. DB_PASSWORD - Password for the database user. DB_NAME - Name of the database to connect to.
Parameters: private_key (paramiko.RSAKey) – RSA key used to connect with host computer. Returns: The connected ssh tunnel to a host computer. Return type: tunnel (sshtunnel.SSHTunnelForwarder)
youchoose.data.ingestion.image module¶
Library for loading in data from images.
youchoose.data.ingestion.nosql module¶
Library for reading and writting of data contained in a nosql database.
youchoose.data.ingestion.sql module¶
Library of functions used to connect and query a SQL database. Connections can be either local or remote and are connected using using SQLAlchemy. A SSH tunnel can be set-up if the remote database is not directly accessable.
-
class
youchoose.data.ingestion.sql.SQLDatabase(db_type='psql', db_name='', engine=None, tunnel=None)¶ Bases:
objectSQLDatabase is a class used to connect to a relational database using sqlalchemy and an env file containing the database credentials.
Some more info about the class attributes and functions.
-
close()¶ Shutdown database connection and ssh tunnel if open.
-
get_dataframe(query: str) → pandas.core.frame.DataFrame¶ Execute the query on the connected database and return a pandas dataframe.
Parameters: query (str) – SQL query Returns: - Results of the sql query returned as a dataframe
- with headings included.
Return type: queried_df (pd.DataFrame)
-
save_layout(filename: str)¶ Save the layout of the database to file.
Parameters: filename (str) – Filetypes are png, dot, er (markdown), and pdf.
-
-
youchoose.data.ingestion.sql.psql_engine(tunnel=None)¶ Create a sqlalchmey engine used for creating the database connection.
Parameters: tunnel (sshtunnel.SSHTunnelForwarder, optional) – Connect using an opened ssh tunnel if needed. Defaults to None. Returns: The SQLAlchemy engine used to create the database connection. Return type: Engine
youchoose.data.ingestion.text module¶
Library for extracting features from text files.
youchoose.data.ingestion.video module¶
Library for extracting data from video files.