# Columns¶

A key ingredient of DataBake is the columns tab, which allows users to define the columns of their data schema and how they are generated as well as any post-generation operations to apply. Columns can be selected from a pool of pre-defined variants or defined explicitly as custom columns. Currently there is a limit of 20 columns that can be added to a table.

## Pre-defined¶

A number of pre-defined column types are available that can be easily added to your schema. These column types have been selected based on the types of customer data companies will typically collect. These columns can also be tweaked through the insight tab - in the same way as custom columns and can be renamed during creation.

## Custom¶

Alternatively, column types can be defined manually in custom mode. From the custom tab, in addition to specifying a column name, you can choose your providers and wrappers to define the ways in which the data will be generated. Values for the column can then be tweaked with Insights.

### Providers¶

Providers are functions which generate random values for the columns. They allow preformatted standard strings or numerical values to be added to a dataset with minimal configuration.

### Basic¶

Basic providers are simple providers for generating specific kinds of data.

Function | Effect |
---|---|

Custom | Returns a single value, very useful when combined with Insights. |

ISO Date | Generates a timestamp between the 1st of January 1970 and the time of generation. |

String | Returns a string specified by the formatting: ```
? will be replaced a random letter
# will be replaced a random digit
E.g. ####-???-ABCD
=> '7432-YGL-ABCD'
``` |

Location | Generates a random latitude longitude pair. |

### Profile¶

Profile providers relate to personal information about individuals commonly used in datasets.

Provider | Effect |
---|---|

Address | Generates a random address. |

Job | Generates a random job title. |

Last Name | Generates a random last name. |

Generates a random email address. | |

Name | Genetates a random full name. |

Phone Number | Generates a random phone number. |

Social Security Number | Generates a random social security number (or national equivalent). |

For all of these providers an optional locale can be set which will generate values specific to that location.

### Distributions¶

These providers will sample a distribution to generate random values. Parameters can be tweaked through Insights, but only a single sample can be returned per entry.

Distribution | Effect |
---|---|

Uniform | Generates values from a uniform distribution (see `numpy.random.uniform()` ). |

Normal | Generates values from a normal distribution (see `numpy.random.normal()` ). |

Log Normal | Generates values from a log normal distribution (see `numpy.random.lognormal()` ). |

Beta | Generates values from a beta distribution (see `numpy.random.beta()` ). |

Gamma | Generates values from a gamma distribution (see `numpy.random.gamma()` ). |

### Choice Distributions¶

Choice distributions are some of the most convoluted but powerful providers which allow relationships to applied from continuous values to discrete ones.

Distribution | Effect |
---|---|

Choice Distribution | Bisect a list at a point from a uniform distribution. |

Beta Choice Distribution | Bisect a list at a point from a beta distribution. |

Triangular Choice Distribution | Bisect a list at a point from a triangular distribution. |

## Wrappers¶

Wrappers allow for a series of operations to be applied to each value generated for a column. These will be applied in the sequence in which they are added.

Wrapper | Effect |
---|---|

add | Offset the value by a float or int. |

multiply | Multiply the value by a float or int. |

max | Filter out results greater than a maximum. |

min | Filter out results smaller than a minimum. |

round | Reduce a float value to a number of digits after the decimal point. |

If the requested operation cannot be applied to the value a None type will be returned instead.