This quickstart tutorial will take you through the very basic process behind creating a dataset with DataBake. You will learn how to specify columns and their relationships, preview your dataset and finally export it.
While there are a number of template projects available to view, for this tutorial we will create a blank project and build our dataset from there. To get started, enter a project name under the ‘Create a new project’ heading and click ‘Let’s go’. This will enable you to track your projects and change the datasets later.
Creating a new project will take you through to the main page - where you can define the columns and the insights in your personalised generated dataset. The title of your new project will appear on the top left-hand side and below it a list of tables in the project. Interacting with this list, you can re-name the table you are about to generate as it will become the name for the output file when exporting your dataset.
You will notice that a help icon on the top right of the main body and in various other places throughout the site. This will lead to the documentation for that section allowing you to quickly gain more insight into the process. If the documentation is not able to help you can also send a support ticket through the tab on the right hand side of the page.
To the right of the project name, you will see four tabs titled ‘Columns’, ‘Insights’, ‘Preview’ and ‘Export’ respectively. These tabs will lead you through the process of generating a dataset using DataBake. Selecting the column tab, we will begin by defining the columns of our table.
Firstly, define the column variables by clicking the ‘Add’ button underneath the ‘Columns’ tab. A dialog box will appear and, as you can see, there is a drop down menu containing a range of pre-defined columns arranged by group. There is also the option to generate your own columns based on a custom specification.
For this example, the first column we will be adding is a pre-defined one called Name - which creates a column filled with randomly generated full names. Click ‘Submit’ and a table with ‘Name’ in the column field will appear. You will notice that the option to pick a localisation setting for your generated names, ignore this for now but for more information see the profile providers sub-section of the Columns documentation.
Next, click ‘Add’ again, scroll down the drop down menu to demographic data and select Age. Here you will notice that column values are calculated from on a numerical distribution and a histogram of values will be helpfully displayed under the heading of Sample Data. This histogram is dynamically generated by the browser and forms a rough approximation of the eventual output. Click ‘Submit’ and the ‘Age’ column will appear in the same table as ‘Name’.
While pre-defined columns are easy to use, custom columns allow a great deal more flexibility for data generation. To add a custom column follow the same procedure as before by select custom instead of pre-defined in the column creation dialog. Type an example column name, such as MagicNumber and click on the ‘Provider’ drop down menu, and select ‘Normal’. This will add a third column to the table as another normal distribution.
Now that we have defined our columns we can define the relationships between them; click on the ‘Insights’ tab to begin adding the relationships between the variables.
Click ‘Add’ and another dialog box will appear which will allow you to define how a column’s values are calculated. Select ‘MagicNumber’ from the columns dropdown, which will be the dependent column in our new relationship. In the ‘Mean’ box begin to type
Age ** 2 and as you type ‘Age’, a blue box will appear below – click on this and the text will automatically update to
Age ** 2. Clicking ‘Submit’ will return you to the Insights tab and you can see the connection between ‘Age’ and ‘number’ in the column relationships box on the left hand side.
You can now click on ‘Preview’ to view your dataset and insights in a table and charts. The default preview tab is the charts, which display histograms and bar charts built from a sample of the dataset. For more information see the Preview section of the documentation.
If you are interested in seeing a sample of raw data, a preview is available in the data tab where you can see a preview of 100 rows.
As we can from the table, the MagicNumber column contains decimal values, and ideally we would want it to be an integer instead. To achieve this, return to the columns tab and add a an int wrapper to the the magic numbre column which will convert all generated values to integers. Alternatively we could have rounded all the results, or added a constant to each of them.
When we are finally happy with out dataset, select the ‘Export’ tab to Export your dataset by choosing the number of rows you wish to generate. A download button will appear after the generation is complete.