# Insights¶

DataBake can generate insights in your dataset by through a range of methods.

The simplest type of insight is defining the literal arguments - which can give you a constant value for a variable in a generation function. For example, a variable with based on a normal distribution can be changed to show insights by changing the mean and standard deviation of the Columns.

The expressions which define the functions’s arguments are processed using `ast.literal_eval()`

. These are then passed to the function where the new values are generated. This allows for some basic operators to be utilised - while still allwowing your dataset to be generated as quickly as possible. Some examples of the types of literals that can be used in this way are:

```
5
3.14
7.4E-9
[1, 2]
'abc'
3 + 1
```

For more details see the python documentation of `ast.literal_eval()`

.

When modifying the variables, a small preview graph will be generated to give a helpful in browser plot of the shape of the expected distribution. Once you add the insight this graph will then be derrived from a preview of the dataset. However, if the resulting expression cannot be used as an argument for the function, the server will return an error and a preview will not be displayed.

For additional information about the types of functions provided and their arguments, see Columns.

## Choices¶

Another way of generating insights is by using the choice providers.

Before any bias is applied, each item will have an equal chance by default. When using a choice provider to determine a columns’ insight, you will be able to change the ratios at which the items are likely to be generated as a value. This can be in addition to any other providers.

## Relationships¶

The most powerful aspect of insights is the ability to add relationships between columns to build dependent variables into your datasets. Using this feature, the arguments passed to the generation function will be updated for each value. This has a substantial perfomance tradeoff compared to using constant expressions, but is necessary to create the most interesting datasets.

Supposing that you have already defined three columns, *age*, *average_product_price*, and *favourite_product*, you can now define a simple relationship between the age of the customers in our dataset and the price of the products they are buying.

This relationship is added by creating an insight for *average_product_price* and setting one of the arguments in the provider to be derived from another column. The arguments for each provider will appear as forms inside the insights dialog window once a column is selected, where you can type in the column name you wish to derrive values from as a variable. For example, if we were to add an expression containing the column name ‘age’ as an argument, this will mean that each time a value for *average_product_price* is generated for the dataset, the value in the age column will be used to calculate each value for the column to which the insight has been added.

More specifically, if *average_product_price* uses a normal distribution to generate values, a relationship could be added to make average product price a multiple of age by entering ‘age * 30’ into the *Mean* box. Once the dataset is generated this would have the effect of simulating a larger range of average purchase prices as people grow older.

This insight could be visualised as a connection between the two variables in a directed graph.

Extending this idea further, more relationships can be added between *favourite_product* and the other two columns. Not only can this be done through having a source for each variable, but the two variables can even be combined with one another.

These relationships are resolved before generation and used to work out the order in which columns should be generated, but it means there is an important condition that the relationships between columns must form a directed acyclic graph - such that no variable is dependent upon itself.

DataBake will display a simplified version of these relationships in a graph on the left hand side of the page under the heading of **Column Relationships**. Here you will be able to see how each column connects to one another, including the expression that dictates their relationship.

## Editor¶

When editing column relationships in the insights dialog window, a few additional features are available. Arithmetic operators can be used in your expressions including modulo and indices, though it is important to note that there is an execution time limit for each cell so you will not be able to use generate datasets containing expressions like `9**9**9**9**9**9**9`

as they will fail to preview. The `==`

comparison operator is also available, primarily for string comparisons in column relatiponships.

## Functions¶

In order to provide flexibility in dataset creation a number of extra functions are available for use in insights.

### Builtins¶

Builtins are basic python functions for commonly used operations.

Function | Effect |
---|---|

`abs()` |
Returns the absolute value of a numerical object. |

`all()` |
Returns true if all conditions in an iterable are met. |

`any()` |
Returns true if any conditions in an iterable are met. |

`chr()` |
Returns the string character of an integer. |

`dir()` |
Returns the list of names in the current scope. |

`hash()` |
Returns the hash value of the object. |

`len()` |
Returns the number of items of an object. |

`max` |
Returns the largest value in an object. |

`min` |
Returns the smallest value in an object. |

`ord()` |
Returns integer representation of a unicode character. |

`pow()` |
Returns the first argument to the power of the second. |

`round()` |
Returns the first value rounded to digits specified by the second. |

`sorted()` |
Returns a sorted version of an object. |

`sum()` |
Returns the sum of all objects. |

### Type Conversion¶

Type conversions let you manipulate the kinds of data being passed into functions.

Function | Effect |
---|---|

`bin()` |
Converts an integer number to a binary string. |

`bool` |
Converts a value to boolean. |

`complex` |
Converts a value to a complex number. |

`float` |
Converts a value to a floating point number. |

`hex()` |
Converts a value to its hex representation. |

`int` |
Converts a value to an integer. |

`oct()` |
Converts a value to its oct representation. |

`str` |
Converts a value to a string. |

### Statistical¶

Statisical functions let you draw values from distributions to be used as function arguments.

Function | Effect |
---|---|

`normal` |
Draws random samples from a normal distribution. |

`triangular` |
Draws random samples from a triangular distribution. |

`uniform` |
Draws random samples from a uniform distribution. |

`poisson` |
Draws random samples from a poisson distribution. |

`beta` |
Draws random samples from a beta distribution. |