Database Access

This module contains functionality to comfortably access a SQL database.

mastersign.datascience.database.execute(sql, db_conn=None, *args, **kwargs)[source]

Execute a SQL statement, returning no data.

Parameters
  • sql – A string as a SQL statement.

  • db_conn – A SqlAlchemy connection string. (optional)

  • args – Additional positional arguments, passed to sqlalchemy.engine.Connection.execute().

  • kwargs – Additional keyword arguments, passed to sqlalchemy.engine.Connection.execute().

mastersign.datascience.database.load_query(query, db_conn=None, date=None, defaults=None, dtype=None, index=None, chunksize=4096, cachefile=None, compress_cache=False, **kwargs)[source]

Load data from an arbitrary SQL query.

Parameters
  • query – A string as a SQL query.

  • db_conn – A SqlAlchemy connection string. (optional)

  • date – A column name or an iterable with column names, or a dict with column names and date format strings, for parsing specific columns as datetimes. (optional)

  • defaults – A dict with column names and default values for NULL values. (optional) Can be used to fill columns with defaults before converting them to numeric data types with dtype. See pandas.DataFrame.fillna() for more details.

  • dtype – A dict with column names and NumPy datatypes or 'category'. (optional) See pandas.DataFrame.astype() for details.

  • index – A column name or an iterable with column names, which will be the index in the resulting DataFrame. (optional)

  • chunksize – The number of rows to load in a chunk before converting them into a Pandas DataFrame. (optional)

  • cachefile – A path to a file to cache the result data from the query. (optional) If the file already exists, the content of the file is returned instead of connecting to the database.

  • compress_cache – A switch to activate data compression for the cache file.

  • kwargs – Additional keyword arguments are passed to pandas.read_sql_query().

Returns

Pandas DataFrame

mastersign.datascience.database.load_scalar(query, db_conn=None, *args, **kwargs)[source]

Load a single scalar from an arbitrary SQL query.

Parameters
  • query – A string as a SQL query.

  • db_conn – A SqlAlchemy connection string. (optional)

  • args – Additional positional arguments, passed to sqlalchemy.engine.Connection.execute().

  • kwargs – Additional keyword arguments, passed to sqlalchemy.engine.Connection.execute().

Returns

A single value

mastersign.datascience.database.load_table(name, columns=None, where=None, group_by=None, limit=None, db_conn=None, date=None, defaults=None, dtype=None, index=None, chunksize=4096, cachefile=None, compress_cache=False, **kwargs)[source]

Load data from a SQL table.

Parameters
  • name – The name of the table.

  • columns – An iterable of column names. (optional)

  • where – A string with on condition or an iterable. (optional) The iterable forms a conjunction and can hold strings as conditions or nested iterables. The nested iterables form disjunctions and must hold strings with conditions.

  • group_by – A string as a GROUP-BY-clause or an iterable with multiple GROUP-BY-clauses. (optional)

  • limit – The maximum number of rows, or a pair with an row offset and the maximum number of rows. (optional)

  • db_conn – A SqlAlchemy connection string. (optional)

  • date – A column name or an iterable with column names, or a dict with column names and date format strings, for parsing specific columns as datetimes. (optional)

  • defaults – A dict with column names and default values for NULL values. (optional) Can be used to fill columns with defaults before converting them to numeric data types with dtype. See pandas.DataFrame.fillna() for more details.

  • dtype – A dict with column names and NumPy datatypes or 'category'. (optional) See pandas.DataFrame.astype() for more details.

  • index – A column name or an iterable with column names, which will be the index in the resulting DataFrame. (optional)

  • chunksize – The number of rows to load in a chunk before converting them into a Pandas DataFrame. (optional)

  • cachefile – A path to a file to cache the result data from the query. (optional) If the file already exists, the content of the file is returned instead of connecting to the database.

  • compress_cache – A switch to activate data compression for the cache file.

  • kwargs – Additional keyword arguments are passed to pandas.read_sql_query().

Returns

Pandas DataFrame

mastersign.datascience.database.set_default_db_conn(db_conn)[source]

Sets the default connection string for subsequent database queries.

Parameters

db_conn – A SqlAlchemy connection string.