News & Events


SCSE’s Undergraduate – Interesting Final Year Projects AY17/18 (June)

Published on: 11-May-2018

Interface for Programmable Command and Control of Autonomous Drones

Author: Bhaarathan Omarsangar (Computer Engineering, minor in Business)
Supervisor: Professor Tan Ah Hwee

Drone technology has been growing in the human community. The drones are being used not only for personal but also for commercial use. In the current era, the drones are being tested for delivering food and used as an alternate mode for transporting goods. They reduce the work load of human, thus increasing productivity.

In this project, an application called Self-Fly App has been developed, which communicates with the Bebop Drone. Building upon the FreeFlight Pro App, the Self-Fly App not only allows the user to operate the drone manually but also autonomously.

The main functions of the app enable the user to control the drone using three different types of control systems, namely Manual, Autonomous and Script. Integrated with google map, the user only has to enter the destinations or the flight path for the drone. In addition to the control systems, the app has additional features which will allow the drone to autonomously land or return the drone to the start point upon reaching a low or critical battery level.

For first time users, the Self-Fly app further includes a tutorial section to guide the user step- by-step with detailed information of the functions implemented.

For the development of the Self-Fly App, Model View Presenter (MVP) is a model guide used for the design architecture. The reason for choosing the MVP architecture is ease of developing the App, which is based on the android system. The implementation of the App was tested using a set of 19 use cases.

This project also produces a video, which demonstrates the functionalities of Self-Fly. The video is organized according to the three scenarios, which demonstrate the three different control systems. In each scenario, the video demonstrates what the user can expect, explains how the App responds and how the drone reacts and behaves.

During the project, many obstacles had to be overcome due to the limitations of the equipment as well as the developer. Firstly, the GPS chip implemented on the drone has a low precision GPS tracker. The GPS signal of the drone is only available in an open field.

In terms of software limitation, the drone’s sensors readings are generally inaccessible, though it can provide the working status of the sensor. Also, environmental condition plays a huge factor while operating the Bebop drone. For example, the drone will wobble in the air if the wind speed exceeds 24mph. As the sensors are inaccessible, the drone had to be operated in an open field without any obstruction in the flight path.

Through this project, I have gained in-depth knowledge of developing Android applications using the Android Studio. Most importantly, I learned how to develop an App to communicate with an external hardware device like the drone. In addition, I learned the different types of drones in the market, including their functionalities, capabilities and limitations. I learned that, for an App to be compatible with a hardware device, it is necessary to know the ins and outs, pros and cons of the device before designing or implementing an App. Without the knowledge of the features and limitations of the hardware, the App will be unworkable. Most importantly, I gained the courage and confidence on working on a project independently and approaching people and posting queries on forum whenever I had a question to clarify without hesitating.

SCSE YouTube link:

Attention Driven Brain Computer Interface Game

Author: Goh Ka Hian (Computer Science, minor in Business)
Supervisor: Professor Dusit Niyato

Brain Computer Interface (BCI) is an emerging technology that provides ways of interacting with computers by means of manipulation of brain activities without depending on muscle intervention. From the Human Computer Interaction (HCI) point of view, BCI can be utilized to make games more interesting as a novel input modality.

BCI can also be used to estimate a person’s mental performance, such as attentiveness which can be utilized as an innovative game input. This project focuses on designing appealing games that utilize BCI to quantify a person’s attention and translate that into innovative game input.

In the first phrase of the project, commercially available BCIs were utilized for quantification of a person’s attention. Experiments was also conducted to verify the accuracy of measurement methods. The second phase of the project focuses on game development using Unity Game Engine. Two 3D attention driven BCI game has been implemented successfully at the end of the project. Both games feature fully functional AI and online multiplayer modes. For future work, state of the art machine learning algorithms can be further explored to improve the accuracy of measurements.

In the first game titled ‘Focus Druid’, attention is used as a game input, affecting game parameters such as speed, attacking power and the controlled avatar. The game rewards players for being mentally attentive, facilitating the achievement of game objectives.

The second game titled ‘Psychic Ball’ is developed for mobile devices, in which players are required to stay focus to be able to maintain control of the ball. The speed of the ball is proportional to the player’s attentiveness.

SCSE YouTube link:

White matter connectivity for detection of Alzheimer's disease

Author: Huang Dajing (Computer Science, minor in Business)
Supervisor: Professor Jagath Chandana Rajapakse, Dr Sui Xiu Chao

Alzheimer’s disease (AD) is the most common type of dementia, characterized by progressive neurodegeneration and cognitive impairment. Patients typically experience progressive loss of cognitive abilities which eventually becomes serious enough to interfere with one’s daily life.

AD is considered as a grey matter disease since neuron loss in hippocampus and cortical atrophy in temporal lobe were consistently discovered in earlier structural Magnetic Resonance Imaging (MRI) studies. With the development of neuroimaging technique, diffusion tensor imaging (DTI) has offered researchers a new window into the white matter integrity and fibre organizations in the human brain. Recent studies suggest that white matter impairment also contribute to AD pathology. DTI may be able to contribute to early diagnosis of AD and individual cognition prediction. Hence, this project was designed to construct structural connectome (i.e. fibre organization) of the brain, which were further employed to make individual classification and cognition scores prediction for participants cross the disease stages, including Normal (NC), Significant Memory Concern (SMC), Early Mid Cognitive Impairment (EMCI), Late Mid Cognitive Impairment (LMCI) and Alzheimer's Disease (AD). 953 sets of MRI scans of 283 subjects were used for this project. Each set of scans consist of a DTI scan and structural MRI scan.

For structural network construction, DTI tractography was first used to detect fibre organization of the brain, which were further combined with structural MRI data to construct the connectivity matrices which represents fibre organisation of the brain. For individual classification of disease stages, the matrices were used as features to feed the Support Vector Machine (SVM) classifier. For individual prediction of cognition scores, the matrices were employed as features to feed the SVM regressor to predict the composite memory (ADNI-MEM), the composite executive function (ADNI-EF) and Mini–Mental State Examination (MMSE).

The structural connectome based on DTI achieved high accuracy for individual prediction of disease stages and cognitive scores. The 5-class classification tasks achieved an average F1-score of 0.89 (p<0.05). The ADNI-MEM prediction achieved a Mean Squared Error (MSE) of 0.32 and coefficient of determination (R2) of 0.55 (p<0.05). The prediction of ADNI-EF achieved MSE of 0.36 and R2 of 0.53 (p<0.05) and the prediction of MMSE achieved MSE of 4.02 and R2 of 0.55 (p<0.05).

SCSE YouTube link:

IP@Remote – Intelligent Photography Remote (with Arduino & Android)
Water Droplet Photography

Author: Jasmine Wong Zhen Ying (Computer Science)
Supervisor: Clement Chia Liang Tien

High-speed photography is the art of capturing an image of very fast phenomena; the collision between two water droplets occur within a window of between 20-40 milliseconds. In the commercialized market of high-speed photography peripherals, there is a range of device targeted mainly at scenarios that involve sound, light and laser sensors. Despite the existence of water droplet photography kit, most of the commercialized kits are sold as an individual system.

This project aims to enhance and maintain the features of the previous project, as well as to accommodate the integration of a water droplet system. The focus of the project is heavily concentrated on the design and implementation of a water droplet system that renders assistance to the user in achieving realistic expectations in the art of water droplet photography.

Currently, the system can control up to three solenoid valves. Which grant the user the ability of achieving different forms of water droplet artistry through precise manipulation of water droplets collision sequence. Depending on how the user conducts the experiment, various beautiful effects of water droplet collision can be achieved as shown in the pictures.

Through the harmonization of hardware and software, high precision in the manipulation of water droplets could be achieved. This unravels the endless possibilities of producing photography art in water droplets.

SCSE YouTube link:

Building Agent for Power TAC

Author: Jiang Xiaoxuan (Computer Science)
Supervisor: Associate Professor Bo An

As we are transitioning into low carbon economies, the adoption of smart grid technology and the increase of the amount of renewable energy suppliers in electricity market have led to energy revolution. It is characterized by more autonomous demand-side participation and market deregulation. These changes demand improvements in the market structure and participant behaviours.

The cost of testing new approaches for smart grid market in real market is too high. Power TAC provides a safe simulation environment in which participants develop agents to act as brokers. In this context, brokers’ primary goal is to maximize its cash position. In this project, we developed a broker which performed well in our experiment setting which is a simulation involving top 3 agents from 2017 competition. We had separate strategies in three markets, namely wholesale market, tariff market and balancing market to ensure that we can buy electricity at a relatively low price and earn a steady, high revenue in the tariff market. At the same time, we need to maintain real-time balance as imbalance between supply and demand is highly discouraged.

There were three research challenges when we were building the agent. The first is to find a way to predict the clearing price during the bidding process in an ever-changing environment. We modelled it as a deep reinforcement learning problem. The second is to predict the supply and demand quantity in order to maintain real-time balance and set enticing yet lucrative tariff price. We built mathematical models to predict the renewable energy production based on weather report and predict customer demand by past historical record. The third is to find a way to outperform other competitors in the market, we differentiated ourselves by providing a portfolio of tariffs of different types which served for different purposes such as earning aggressive profit, flattening peak demand of electricity usage and ensuring stable revenue.

SCSE YouTube link:

Framework for Remote Monitoring System using a Mobile Robot

Author: Koh Cheng Khim (Computer Engineering)
Supervisor: Asst Prof Lam Siew Kei

With the increasing number of elderlies living independently at home, the possibilities of them having accidents at home are on the rise as well. The goal of this project is to develop a mobile robotic system that can interact with the elderlies and allow a caregiver to perform remote monitoring. The system will combine a mobile robot with vision sensing capabilities and a chatbot.

The mobile robotic system allows the caregiver to monitor the condition of each elderly by optimizing the vision based sensing function of the robot. Once a command is given, the robot will proceed to detect the location of the elderly and take a picture, sending the image to the caregiver for assessment.

The system is implemented with motion detection algorithm. Once a motion is detected and perceived as hazardous, caregiver can take necessary actions based on image received. Thus lowering the rate of serious injuries happening.

Caregivers are also able to perform real time checks on the elderlies via the mobile robotic system, granting them quicker monitoring access at any point in time. Elderlies are able to interact with the chatbot on various topics like weather and news, accompanying them as a virtual companion.

SCSE YouTube link:

Gated Convolutional Neural Network for Fine-grained Automatic Essay Scoring

Author: Lee Xing Zhao (Computer Science)
Supervisor: Associate Professor Hui Siu Cheung

English is a language that is spoken across the world as an international language. However, I have seen a lot of people who may be able to read but struggling in writing and speech. This includes my friends from different nationalities in secondary school, polytechnic and even university. This often takes place in countries across the globe where English is not their main communication language. Therefore, proper English education in the early stages of life is very important.

Composition writing has been a crucial segment in the English education which is tested heavily in school exams. It is also a good judge of one's command of the English language. Writing a good English composition requires an accurate understanding of the topic, proper usage of vocabulary and a good command of grammar structures. The most effective way for students to improve their writing skills is through repeated exercises. However, compositions are hard to assess and moreover, due to the number of limited teachers available in each school, giving students adequate composition exercises is not practical in real life. Bad writing and communication in writing which leads to a poor usage of English in daily life surfaced in the aftermath of the event.

I had always been interested in artificial intelligence (AI) ever since I coded my first classical "hello world" program. Recent years, deep learning has bloomed in the field of AI. Deep learning is inspired by human brain that learns automatically. For my final year project (FYP) in Nanyang Technological University, I got an opportunity to learn about deep learning and to build something out of it. Hence, I decided to work on the Automated Essay Scoring (AES) problem. AES has desirable impacts that benefits both the education industry and society. It aims to reduce the workload of teachers and educators which enables them to focus their efforts more on teaching and planning.

There are already existing work on deep learning for AES that learns purely based on word level representations. But that may not be a proper way to judge an essay as misspelled words will be identified as unknown based on the model's dictionary. Furthermore, rare words are also unidentifiable by the model if the dictionary does not contain the word. For my FYP project, I implemented a novel deep learning network for AES. The model learns word representations at both word and character levels. Character level helps to preserve the semantics for text which is remarkably influential in a fair judgement of essay. After all, a good AES system should take in everything in the essay and assign credit when credit is due like a human grader would. Experiments has been carried out to investigate the effectiveness of the newly implemented model using benchmark dataset from Kaggle. Few baselines model are also implemented for comparison. The results of the experiments show that the newly implemented model outperforms the recent available models with smaller memory footage and faster training time.

Lastly, I'm very grateful that I had been given the opportunity by School of Computer Science and Engineering (SCSE) to learn something that I'm interested in and come up with a successful solution that might be able to help the public as my FYP project. I hope the public can be aware of the effectiveness of using AES system in the English education.

SCSE YouTube link:

Game Design & Control using EEG Signals

Author: Ng Yanrong, Lynette (Computer Science)
Supervisor: Dr Smitha Kavallur Pisharath Gopi

There are a lot of online reviews on the medical treatments provided by the gynecologists. It will be too laborious for the human beings to process all the reviews at one go. Hence, this final year project is developed to analyze the comments in the web forums and find out the patients’ impressions on their gynecologists. Various measures have been explored to write an entity recognition algorithm in identifying the gynecologists being mentioned in the reviews. Data mining techniques have also been employed to dig out the essential information. Lastly, a web application is developed to let its users to know the relative cost of treatments and the experiences of the treatments with each gynecologist. Statistics on the Top 10 gynecologists in terms of medical experience and cost of treatment are also rendered on the web page.

SCSE YouTube link:

Interactive Mobile Profiler

Author: Nicholas Lee Yong Hwee (Computer Science)
Supervisor: Dr Owen Noel Newton Fernando

Interactive Mobile Profiler is a location-based Android application that matches events created by users to other users. The target audience ranges from casual users to big event organizers. The type of users is split into two – organizers and potential participants.

Users can specify their interests in various topics that can be modified at any time. These preferences will be used to serve interests-specific events to them. Moreover, participants can also view events around their vicinity and the distance search range can be tweaked at any time. These events are matched dynamically and in real-time. In other words, the moment the organizer posted his event, everyone in the vicinity or with matching-interests will see it instantly. This is achieved by building the application alongside with Amazon DynamoDB which boasts single-digit millisecond latency.

The organizers can create events in an advertisement manner that will be shown to interested and/or nearby users. These created events are tagged with specific interests which will be used to show to relevant users.

Furthermore, the application also incorporates various quality-of-life features such as an intuitive user interface, ability to get step-by-step directions of an event, social media sharing and many more.

SCSE YouTube link:

Visual Target Selection using Electroencephalography (EEG) based Brain-Computer Interface (BCI)

Author: Oh Yoke Chew (Computer Engineering)
Supervisor: Dr Smitha Kavallur Pisharath Gopi

In patients with severe lock-in syndrome, motor ability of all four limbs and speech are lost. Reliance on others for their daily needs is inevitable. Due to the loss of ability to communicate with others, the difficulty of taking care of such patients increases. To alleviate this problem, Brain-Computer Interface (BCI) was proposed to act as a tool of communicating with external devices.

Brain-Computer Interface (BCI) is a system that decodes the neural activity into commands that are used to communicate with an external device. Out of all available invasive and non- invasive techniques for acquisition of brain activity, Electroencephalography (EEG) is often the most preferred technique due to its non-invasiveness and high temporal resolution. EEG acquire the electrophysiological activity of the brain by placing electrodes on the scalp.

In this project, the aim was to develop an EEG-based BCI. The BCI would be based on Covert Visual Spatial Attention (VSA) and would be used to control an external device, Sphero. Since VSA is a voluntarily process, a directional cue could be used as a stimulus for elicitation of neural oscillations. In addition, the project explored whether a mixture of alpha band (8-14Hz) and beta band (14-30Hz) could improve the accuracy. For acquisition of EEG data, 14 channel Emotiv Epoc+ Neuroheadset would be used. The subject would have to go through a training stage where directional cues would be shown prior to the start of each trial. The subject would then have to fixate their eyes to a central cross and shift their attention according to the directional cues. Data collected of each trial would be pre-processed with a digital Butterworth filter to obtain the desired band signal. Average band power of the filtered signal will be calculated and used to train a Linear Discriminant Analysis (LDA) model. The subject will then go through a test stage where the LDA model will be used to predict the direction of attention shift. Cross-validation (leave-one-out method) and test accuracy will be used as an evaluation metric for this BCI.

SCSE YouTube link:

AudiCeive – Recognising unique audio frequencies through acoustic fingerprinting

Author: Saraswathi Karuppia (Computer Science)
Supervisor: Dr Owen Noel Newton Fernando

With the rising trend of children owning a smartphone from a very young age these days, it has become a great concern for parents when their kids do not use it for the right purposes and end up spoiling their health unknowingly.

This project aims to address this issue by incorporating acoustic fingerprinting technology into an android application that will make the knowledge transfer process more appealing, especially when such applications for the purpose of education are scarce.

An acoustic fingerprint is a summary of a signal using limited number of bits and usually used to determine the equality of 2 audio signals. Digital signal processing is applied to create the fingerprints from ultrasounds, which are embedded in the video clip so as to maximise the signal-to-noise ratio.

These fingerprints as well as the corresponding scene details for every ultrasound are pre-stored in the database. When the user places the phone next to the audio source, the app constantly makes a real-time recording of the sound and the same fingerprinting algorithm that was used to make the fingerprints from the ultrasounds earlier, is applied onto the signal. If the fingerprint matches with one in the database, the respective data for the scene is displayed on the app.

The interactive nature of the app allows children to develop their interest in learning new things. The fingerprints extracted are also very robust, due to the usage of ultrasounds.

Ultratour is created to work along with AudiCeive, as it can be used in place of the video to play the ultrasounds. This application was created to facilitate an easier learning of places of interest for tour guides.

Ultrasounds are extremely powerful and when combined with digital signal processing, remarkable products can be created and AudiCeive is one such example.

SCSE YouTube link:

Real-time Brain Computer Interface (BCI) for Robotic Systems

Author: Deon Seng Wee How (Computer Engineering)
Supervisor: Asst Prof Lam Siew Kei

Research and development in computer technology has been intense and successful since the early 20th century. It has allowed us to own most of our prized possessions today, such as smartphones and computers. Today, technology is ubiquitous with an ever-growing demand for better technology. Not only does it improve and changes our daily life, computer technology has also contributed greatly to assist in medical operations. One of the trending researches today is the Brain Computer Interface (BCI) with the purpose of assisting people with disabilities.

This specialized field of research utilises various electrical signals which can be captured on the scalp of the subject’s head due to brain activities. Using commercially available headwear, BCI technology can be developed easily. In this project, a variant of BCI technology is being used. Hybrid Brain Computer Interface (HBCI) uses a combination of brain-activity generated signals along with other additional components. The focus is mainly on muscles around the facial region, such as the jaws and eye muscles. The purpose of this is to develop a more robust HBCI system with easier use as signals generated from the brain can be easily corrupted with noises from other psychological effects and thoughts. It was also found that the electrical signals generated by the muscles are much easier to capture and process.

The HBCI system proposed in this project involves the use of open-source BCI technology with muscle signal sensing capability, with a laptop to interface between the headset and the targeted device for application, in this case, a two-wheel motor robot. On the laptop, a software simulated maze exploration game was developed as an application of this project.

The controls for the robotic system and maze exploration game are the same. A clenching of jaws will be read as forward, while closing left eye or right eye will be read as anticlockwise or clockwise rotation respectively. For each user, a profile must be created, and the user must go through the training process. This is to allow the system to recognise the signals generated for everyone. A neural network is trained using the training data pre-processed by the individuals. After which, the user can do a live testing to determine the accuracy of the training. The live testing involves the user performing the action randomly chosen by the computer. At the end, the accuracy results will be shown.

The user can also proceed to use the application developed for the project. The maze game allows the user to control an avatar through a maze, with reaching its end as the goal. In addition, the user has a choice to start the robotic system. The robotic system is a 2-wheel motor robot powered by Arduino. The robot design is based on a project which is part of our school’s curriculum. Modifications were made to allow a wireless connectivity and interfacing with the laptop and the BCI board. Using XBee technology, we can easily connect the laptop to the robot using Bluetooth Low Energy (BLE) communication. By configuring the XBee Bluetooth transceivers using XCTU software, communication settings can be easily accessed and personalised. The set-up is transparent, and by using a simple serial read/write on the matlab and Arduino, the XBees can start communicating with each other without looking deeply into the Bluetooth communication. Another beauty of XBee technology is the ability to create a mesh of XBee transceivers, allowing interconnectivity between multiple devices. For simplicity, only one device is considered for this project.

An experiment was conducted for this project. A total of 5 subjects, including myself, have participated in the testing of the system. The experiment results are as shown below.

We can see that the system had shown reasonable accuracy for the actual live testing despite the difference in accuracy levels from training and testing.

The project involved the simplicity in using commonly found signals on the facial region compared to conventional BCI which focuses solely on brain-activities. This extends the potential for HBCI to allow better use of such technology to improve the performance of such systems. However, such technology is not without its limitations, and HBCI may not be applicable for everyone as it involves other physically available electrical signals which may not be present in everyone with medical disabilities.

SCSE YouTube link:

First-Person-View remote driving vehicle

Author: Gan Ming De (Computer Engineering)
Supervisor: Assoc Prof Nicholas Vun

This project is a web-based interface to remotely 'drive' a robotic vehicle by using a steering wheel and pedals. It is based on streaming video from an on-vehicle camera with virtual reality in first person view. Users can control the movement of the robot by using the steering wheel and pedals, and control the camera rotation with an Android phone that is attached with an in VR-headset.

It was developed using wireless technology, live streaming application, robot control and Gamepad API. The hardware used in this project include Raspberry Pi 3, Arduino, USB Webcams, Driving Steering Wheel with Pedal, Android phone, and VR-headset.

Raspberry Pi setups a wireless access point for communicating with other devices. The computer and Android phone are connected to the Raspberry Pi via Wi-Fi access. Also, the Arduino are connected to Raspberry Pi via Serial Port ‘dev/ttyACM0’.

The steering wheel and pedals are connected to a computer by using USB. Port 8080 is used to communicate between computer and Raspberry Pi. The commands sent form PC is used to control the robotic movement.

There are two USB Webcams connected to Raspberry Pi. The Android phone can stream the video of both USB webcams by connecting to stream port 8081 and 8082. It also sends the orientation of users’ head movement to Raspberry Pi via port 1234, with commands used to control the camera rotation.

Arduino controls two DC motors and two servo motors by listening to commands from Raspberry Pi. The servo motors are able to perform pan and tilt controls.

During the period of this project, one of the challenges faced was to understand the background of different platforms or devices. For example, the technique of transmitting data from Android application to HTTP server in Node.js, as each platform adopts a different programming language. Besides, it was a challenge to achieve 3D stereoscopic view in virtual reality because of the difference between human eye and camera. Human eye allows a 3D stereoscopic vision while camera capture only 2D images which have no depth.

From the results of this project, it can be concluded that the wireless connection between the devices was working properly as data was transmitted successfully. Based on the comparison of different camera streaming settings, the resolution of 320 x 240 and 60 framerates is chosen as it provides a better performance in terms of lesser latency for streaming both videos on Android application.

Although the implementation of stereoscopic first-person-view in virtual reality can be done, there were various limitations to this function. The result of the streaming video is first-person-view but not perfectly displayed in a stereoscopic view. A better solution for hardware or image processing is required to resolve the above mentioned issue.

Through this project, the motivation of designing a stereoscopic first-person-view robotics system is accomplished. Although it requires knowledge of different areas, such as hardware, wireless network, server communication, mobile application, stereoscopic view and so on. This project gives the insight to create a new milestone in virtual reality and robotics industry. Last but not least, it is a great opportunity for students to learn and apply both hardware and software skills and knowledge.

SCSE YouTube link:

Cancer Genomics Data System

Author: Huang Xuhui (Computer Science)
Supervisor: Assoc Prof Zheng Jie

In an age of information boom, massive biological data are being generated daily, and these enormous data are becoming more important for scientists and pharmaceutical industry to study and develop new drugs and therapeutic methods. Traditionally, biological data are stored in the databases of organizations ranging from research labs in the universities and government research institutes to biotech companies. Scientists or drug companies may get part of the data from biomedical scientific papers published, but not the whole datasets, and it is not easy to retrieve the data from limited documents.

To address user’s growing needs for biological data, we proposed a cloud database with RESTful API services solution, so that users like biological application programmers, researchers and drug companies can access the selected biological data anywhere and anytime through our API services. In addition, an Android application for client side is developed to present the data on mobile devices. With the OCR function on the Android application, users can effectively retrieve the data from our cloud database by providing related image or documents screenshot.

The cloud database with RESTful API service, named SynLethCloudDB – synthetic lethality database, was designed and developed to store the biological synthetic lethality data on the cloud which can be accessed by user through API service.

The Android application with data visualization and OCR feature, named SynLeth Mobile, was designed and developed to present the data in different tables and charts, which enable users to conduct drug testing with different datasets and algorithms, and to retrieve related data from the cloud database by using OCR feature.

With the SynLethCloudDB, the following benefits were achieved:

• Better scalability
The database service in Azure can easily scale up or down based on the database performance.

• Better query performance
With the auto indexing configuration, the cloud database would have a better query performance.

• Easier data accessing
RESTful APIs in SynLethCloudDB provides the convenient way for researchers and programmers to access the data for experiment or program implementation.

• Better design architecture
The API web application in SynLethCloudDB was implemented by using the object-oriented language - C#. The design has high modularity and high cohesion with low coupling.

• Larger development potential
With the API web application and Azure platform, it provides an immense potential to integrate with other data related project like machine learning and data mining.

This Final Year Project has been an excellent learning experience. I would like to thank the school for providing this study opportunity. I would also like to express my gratitude and thanks to my Final Year Project supervisor, Asst Prof Zheng Jie, for his patient guidance, inspiration, and valuable suggestion during the whole of the project.

SCSE YouTube link:

A Multimedia Transcription System

Author: Nguyen Huy Anh (Computer Science)
Supervisor: Assoc Prof Chng Eng Siong

With the advent of computing, a huge amount of data is being created every day. Most of the data are unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation — transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change or upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes.

This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs — single-file audio or video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions, and also able to perform robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline — transcription, captioning and visualizations of transcripts and captions.

The project was evaluated on existing audio records of talk shows (Singapore’s 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieved all the requirements and proved the usefulness of this project.

The open-sourced portion of this project is currently available at

For more information, please contact:

Assoc Prof Chng Eng Siong
Speech and Language Research Group
School of Computer Science and Engineering

SCSE YouTube link:


Back to listing