The Estimating Probabilities from External Data dialog is used to select a CSV data file for which you'd like DPL to conduct tests and estimate probabilities for. Prior to importing the CSV into DPL you will want to clean up your data (i.e., specify column names, ensure the target event is the last column, etc.).
To Estimate Probabilities from External Data (CSV):
- Select Data | Learning | Estimate Probabilities
- Within the Event data file section, browse for the desired .CSV file (Note: the file must be closed in order for DPL to be able to conduct analyses on it)
- Within the Nodes for columns/attribute in the data file section, choose whether you want DPL to create new chance nodes from the columns of data or use chance nodes that already exist in your model (Note: chance node outcomes must match those contained in the data file)
- Within the Probabilities, influence arcs, and tests section, choose how DPL should conduct the tests and generate the probabilities and influence arcs in the model. You can choose to have no tests/arc generated (only the marginal probabilities will be defined for each event), have DPL automatically determine how many tests/arcs are needed, or explicitly specify the desired number of tests using the spin box.
- Within the Optimization box, you can choose from two levels of testing. With the default, Steepest ascent, DPL chooses arcs one at a time based on the variables that provide the most information on the target event. The Exhaustive search setting is only available if you have explicitly set the number of tests you'd like DPL to conduct on the dataset. This setting tells DPL to look at all of the variable at once in order to create the best combination of tests given the number specified.
Once you've chosen your settings, click OK to have DPL analyze the dataset, conduct the desired number of tests, estimate the probabilities, and add nodes/arcs to the model. DPL will notify you via a dialog that the import is complete. The completion dialog will also display for how many chance nodes probabilities were calculated and list percentages for Target predicted and Information.
When you look at the model you'll find that a chance node has been added for each input column. Furthermore, influence arcs have been added to the model where appropriate, an indication of the tests that will provide information about the target event. DPL names the chance nodes according to the first row (heading) of each column of data. For each chance node, DPL will define the appropriate number of states as defined within the data for each column.
Versions: DPL Enterprise, DPL Portfolio
See Also