File Storage

This module contains functionality to read and write Pandas DataFrames from and to files.

mastersign.datascience.files.read_parquet(filename, columns=None, index=None)[source]

Read the content of a Parquet file into a Pandas DataFrame.

Parameters:

filename – A path to a Parquet file.
columns – A list of column names to load. (optional) If None is given, all columns from the file are read.
index – A column name or a list of column names, which should be used as the index for resulting DataFrame. (optional) By default, the index columns marked in the metadata of the file are used as index for the DataFrame. If no colums are marked as index, a simple incremental integer index is created.

Returns:

A Pandas DataFrame.

mastersign.datascience.files.write_parquet(data: DataFrame, filename, compress=False, append=False)[source]

Write a Pandas DataFrame into a Parquet file.

Parameters:

data – A Pandas DataFrame.
filename – A path to the target Parquet file. If the file already exists and append is False, it is overwritten.
compress – A switch to activate GZIP compression. (optional)
append – A switch to append the DataFrame to the file, incase it already exists. (optional) The schema of the DataFrame must match the existing data in the file.