Subject Code & Title :- BISY3001 Data Mining And Business Intelligence
Weighting :- 25%
Assessment Number :- 4
Assessment Type :- Report
BISY3001 Data Mining And Business Intelligence Assignment
Alignment with Unit and Course :-
Unit Learning Outcome :
ULO1: Demonstrate broad under standing of data mining and business intelligence and their benefits to business practice.
ULO2: Choose and apply models and key methods for classification prediction reduction exploration affinity analysis and customer segmentation that can be applied to.
ULO3: Analyse appropriate methods for classification prediction reduction exploration affinity analysis and
customer segmentation to data mining.
ULO4: Propose a data mining approach using real business cases as part of a business intelligence strategy.
ULO5: Propose a data mining approach using real business cases as part of a business intelligence strategy.
Assessment Description :-
In this assessment the students will extend their previous work from assessment A3 Business case under standing. Here the students must submit a report of the data mining process on a real-world scenario and a presentation and QA Session will be held based on the report written. The report will consist of the details of every step followed by the students.
Important: Students not participating in the Presentation and QA Session (session 11) will get a Zero Mark in Assessment 4.
Cover Page and Table of Contents (0.5 Mark)
• Group Members
Executive Summary (0.5 Mark)
Introduction (0.5 Mark)
• Importance of the chosen area
• Why this dataset is interesting
• What has been done so far
• What can be done
• Description of the present experiment
Data preparation/pre-processing and feature extraction (2.5 Marks)
2.1 Select Data
• Task: Select data
2.2 Clean Data
• Task: Clean data
2.3 Construct data/feature extraction
• Task: Construct data
• Output: Derived attributes
• Activities: Derived attributes
• Add new attributes to the accessed data if required
• Activities: Single-attribute transformations
You must choose a previously selected public dataset for A3 from the websites mentioned in page 1.
Select one or more experiment from the list (your Lecturer may choose one for you):
A. Build a simple classifier apply to dataset (Decision Tree)
B. Cluster Analysis (K-Means)
C. Topic Detection Analysis (Import public post comments from Twitter Facebook Instagram with the help of export comments.com).
Additional experiments may carry some bonus marks, talk to your Lecturer.
3.1 Select Modelling Technique
• Task: Select Modelling Technique
3.2 Output Modelling Technique
• Record the actual modelling technique that is used.
3.3 Output Modelling Assumption
• Activities: Define any built-in assumptions made by the technique about the data (e.g., quality format distribution. Compare these assumptions with those in the Data Description Report. Make sure that these assumptions hold and step back to the Data Preparation Phase if necessary. You can explain the data file here, even when it is pre prepared.
3.4 Generate Test Design
• Activities: Check existing test designs for each data mining goal separately. Decide on necessary steps number of iterations, number of folds etc.). Prepare data required for test. You can use 66% of records for model Building/Training and rest for Testing).
3.5 Build a Model
• Task: Build a model. Run the modelling tool on the prepared dataset to create one or
more models. (Using Knime Tool as shown in the lab).
3.6 Output Parameter Settings
• Activities: Set initial parameters. Document reasons for choosing those values.
• Activities: Run the selected technique on the input dataset to produce the model. Post-process data mining results (e.g., editing rules, display trees).
3.7 Output Modelling Technique
• Activities: Describe any characteristics of the current model that may be useful for the future. Give a detailed description of the model and any special features.
• Activities: State conclusions regarding patterns in the data (if any); sometimes the model reveals important facts about the data without a separate Assessment process (e.g., that the output or conclusion is duplicated in one of the inputs).
4.Result Analysis / Evaluation
Previous evaluation steps dealt with factors such as the accuracy and generality of the model.
This step assesses the degree to which the model meets the business objectives and seeks to determine if there is some business reason why this model is deficient. It compares results with the evaluation criteria defined at the start of the project. A good way of defining the total outputs of a data mining project is to use the equation:
RESULTS = MODELS + FINDINGS
In this equation we are defining that the total output of the data mining project is not just the models although they are of course important but also findings which we define as anything apart from the model that is important in meeting objectives of the business or important in leading to new questions, line of approach or side effects (e.g., data quality problems uncovered by the data mining exercise).
Note: although the model is directly connected to the business questions, the findings need not be related to any questions or objective but are important to the initiator of the project.