Data Mining for Business and Governance #
Introduction to the course #
Data Mining for Business and Governance will be accessible for all students (no technical background required). During the course, students will complete mandatory assignments in which they will train their basic data mining skills, gain insights from data, and conduct reproducible research. The assignments will be performed with open-source software (jupyter, pandas, and scikit-learn). There will be one midterm exam to ensure that students keep on track with the course contents. The course is completed with a written exam.
Watch the video lectures #
Description of the contents #
Steadily increasing computing power, data sources, and connectivity have made data science an important component in both industry and parts of scientific research. This course offers an introduction to this intersection of statistics, computer science, and machine learning in various domains. Upon completion of the course, students will have acquired the skills necessary to collect, analyze, and interpret data. Additionally, students will be familiarized with a range of algorithms for automated decision making, and methods of critically evaluating their performance and impact on society.
The course offers a thorough introduction to the use of data mining methods. Some data mining concepts and techniques to be covered by this course include, among others, data collection, supervised and unsupervised learning, regression analysis and pattern classification, how to evaluate the quality of a learning system, association mining, data decomposition. In addition, students will be familiarized with domain-specific applications such as computer vision and text classification.
The perspective of the course is application-oriented and serves to provide students with the knowledge and experience that is in line with the current demand for skilled data scientists, and computational researchers.
The course will be offered online. The Education team has carefully organized the activities to ensure the quality of the course while reducing the stress of online teaching as much as possible. The course is divided into four components that deal with three separate parts of data mining: theory, practice, and application.
- Lectures. The online lectures will be pre-recorded such that students can watch them when convenient while adjusting their own rhythm to ensure that no detail is lost. The main purpose of lectures is to introduce important theoretical concepts within the field of data mining. Since the course does not follow a book, it is strongly recommended the students to watch the lectures and take notes such that they can formulate short, sharp and to-the-point questions.
- Reading materials. The reading materials consist of book chapters and academic papers that will help the students consolidate the theoretical topics discussed during the online lectures. The students are responsible for getting access to the recommended books (please refer to the syllabus for more details concerning the bibliography). In the case of the academic papers, they will be provided through Canvas together with the pre-recorded lectures.
- Assignments. Knowing the theory of data mining techniques is important, however, the practical component is pivotal for data scientists. The assignments will deal with the application of data mining techniques to real-world problems. This will help students better understand the theoretical concepts and the full potential of data mining.
The course is organized into three components: lectures, practicals, and Q&A sessions. The pre-recorded lecture, reading materials, and practicals will be provided on lecture day (Tuesdays) at the latest through Canvas. We advise that you follow the lecture at, or close to, the designated time slot. The practicals involve independently going through hands-on exercises that build on the lecture material. Therefore, it is required to have watched the lecture and read the supplementary material. As proof of completion, students are required to hand in the output of the assignment. This is a mandatory component to complete the course, and will be evaluated with a pass or fail after a sanity check. On Thursdays, there will be a live Q&A session in which students can ask questions and discuss any difficulties they may have with the material. Given the limited time, we ask students to formulate their questions concisely while avoiding investing much time on implementation-like issues that would bring little added value to the discussion.
Learning goals #
Upon completion of the course, students will have acquired the skills necessary to apply data mining to extract information from large data sets and transform it into an understandable structure. Specific learning objectives to be fulfilled include:
- Indicate important components and tools in the data science ecosystem.
- Describe and explain the elementary principles of data mining and their application in different contexts and domains.
- Employ several preprocessing and data representation techniques, and supervised and unsupervised learning algorithms.
- Analyze and evaluate reproducible data mining experiments.
- Draw conclusions on the potential and limitations of data, algorithms, and models, and their application in society.
The exams will be online unless otherwise communicated. The details of the exams will be timely and opportunely communicated such that the students can adjust their expectations. Overall, passing the course is determined by three examination components:
- Assignments complete (y/n). The assignments are not graded; students get a complete mark based on handing in an assignment.
- Midterm grade 20%. The midterm exam will test your theoretical knowledge and practical insights (not skills, so no programming, and not the math parts) regarding the material we have discussed at that point. The midterm exam serves as a way to gauge students' comprehension of the basic material while preparing them for the question style of the final exam.
- Exam grade 80%. The final exam covers all material of the course, including math parts. We expect the students to understand and be able to apply everything shown in the lectures. Other than that, the examination style is similar to that of the midterm, with a few changes. The students are kindly referred to the syllabus for more information about the exams.
The communication of this course will be via Canvas. This includes important announcements about the exams, materials and scheduled sessions whenever necessary. Moreover, students are encouraged to post their questions in the discussion section such that other students can benefit from the answers provided by the education team.
Unfortunately, it is not possible for lecturers to reply to all emails with the promptness and quality that the students deserve. Therefore, the students are kindly asked to refrain from contacting the lecturers via email to ask questions about either the course organization or the theoretical and practical contents. Instead, students are kindly invited to post their questions in the discussion session of Canvas to ensure efficient and transparent communication.
Of course, you can contact the lecturers (preferably both) via email if there are rather private issues that you feel that your fellow students should not be aware of. Please bear in mind that all issues have solutions if they are properly and timely communicated.
Chris Emmery is a lecturer at Tilburg University, as well as a joint PhD candidate doing research for both CSAI at Tilburg and CLiPS at the University of Antwerp. He’s interested in the effect of intelligent systems on our lives. Systems that uncover our personal information, monitor and change our behavior, subtly restrict our exposure to information, and treat us unfairly. He has taught Data Mining in context of the Data Science master for four years. He also teaches Data Processing and Spatiotemporal Data Analysis, and previously Text Mining.
Gonzalo Nápoles received his PhD degree from Hasselt University (Belgium) and Maastricht University (The Netherlands). His research interests include hybrid machine intelligence, recurrent neural networks, cognitive learning systems, knowledge engineering and chaos theory. His teaching experience includes courses such as Business Intelligence, Knowledge Discovery, Introduction to Information Systems, Numerical Analysis and Knowledge Representation.