Recently, I like to read articles that relate between Indonesia’s Presidential Candidates and IT infrastructure. I can argue or giving inputs to public because IT is my daily stuff.
I would like to continue what I have said previously in my facebook post about what I can do in 5 minutes compare to Jokowi’s IT consultant did in 2 weeks. Below are list of articles that make me want to discuss about it;
Listed above are articles on Jokowi’s program that use identification card for poor student to have free education in Solo and Jakarta. It’s not running well due to corruption and vulnerability on IT implementation.
After reading articles above, we can conclude that;
- Its still using manual input on Excel
- Data inputted to PHP – MySQL based application
- There are +/- 110,000 students record in Solo and +/- 401,767 students record in Jakarta.
- The IT consultant input the data from Excel to Database in 2 weeks with help from local CPNS team
- Time to filter duplicate records not mentioned on the article.
- Corruption can be identified by 9,006 duplicate names found on the list, indicated by identical full name and parents name (mother’s name).
It annoyed me so much that makes me want to know why it took so long time just to find the problem is. So, I made some simulation for it. To make it fair I will measure all time spent to do each activities using stopwatch and time indicated by the MySQL server to do a query.
I tried to simulate the situation by using Ubuntu Server box that I have
I installed MySQL server by using this command
sudo apt-get install mysql-server
with total internet speed 26Mbps, I download and set up the root username and password for the database. It took me around 2 min to have database complete installation.
We need to have Excel sheet to simulate 110,000 students record in Solo. Luckily, I have a dumped csv record file that contains +/- 210,743 records (twice the data we needed) with 7 columns for each records, its size almost 25,4 MB from http://transparency.ct.gov/html/downloads.asp.
In Indonesia, students’ record usually have below information;
- Student Identification Number
- Full Name
- Postal Code
- Phone number
- Parents Name
- Parents Address
- Date of school acceptance
The size of the Excel file after adding 4 additional columns and its duplicate data around +/- 26,3MB. Not to forgot to mention that this Excel file was save in CSV data type.
First, we access the database
mysql -u root -p
enter your MySQL root password. check listed databases (0,00 sec)
Create database for simulation (0,00 s)
CREATE DATABASE sim_kjp;
Swicth to the newly created database
Create table for simulation (0,1 s)
CREATE TABLE IF NOT EXISTS `kjp_record` ( `sid` varchar(64) COLLATE utf8_unicode_ci NOT NULL, `fullname` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '-', `pob` varchar(64) COLLATE utf8_unicode_ci NOT NULL DEFAULT '-', `dob` date NOT NULL DEFAULT '0000-00-00', `address` text COLLATE utf8_unicode_ci NOT NULL, `city` text COLLATE utf8_unicode_ci NOT NULL, `zip` varchar(8) COLLATE utf8_unicode_ci NOT NULL, `phone` varchar(32) COLLATE utf8_unicode_ci NOT NULL, `religion` varchar(32) COLLATE utf8_unicode_ci NOT NULL, `parentsname` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `parentsaddress` text COLLATE utf8_unicode_ci NOT NULL, `doa` date NOT NULL DEFAULT '0000-00-00' ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
After the database, table and everything has been set. We quit from the MySQL shell.
Now its the time to import the CSV file into the database. I’m going to use mysqlimport command to import large file size with > 200,000 line of records.
mysqlimport --ignore-lines=1 --fields-terminated-by=, --columns='sid,fullname,pob,dob,address,city,zip' --local -u root -p sim_kjp /home/username/kjp_record.csv
[quote]The above command will import kjp_record.csv into kjp_record table inside sim_kjp database using username root on mysql matching columns; sid, fullname, pob, dob, address, city, zip. First line of the CSV ignored due to its a title for each columns. each column data separated by ‘,'[/quote]
Time to upload the data itself and insert whole records to the specific columns on a server with 4GB RAM and 128GB SSD is surprisingly took only 2-3 seconds.
To check whether its failed or not even though the message notification there’s no error I use the SQL query;
SELECT COUNT (*) FROM kjp_record;
I must said that I have to correct my statement that it only took maximum 3 seconds to upload data to the database not 5 minutes as I have said previously 😉
To filter whether it has duplication on each record we can simulate it using SQL query.
SELECT t1.sid AS SID_1, t2.sid AS SID_2,t1.fullname AS dup_name1, t2.fullname AS dup_name2, t1.parentsname AS pdup_name1, t2.parentsname AS pdup_name2 FROM kjp_record AS t1, kjp_record AS t2 WHERE t1.sid != t2.sid AND t1.fullname LIKE CONCAT('%',t2.fullname,'%') AND t1.parentsname LIKE CONCAT('%',t2.parentsname,'%')
Explanation for SQL query above is
[quote]table kjp_record in database sim_kjp is being aliased twice so we can make comparison between 2 tables. Duplicate data indicate by similar name and similar parent’s name that’s why I’m using AND clause in WHERE. This query can find modified fullname such as M. Andi and Ma Andi with similar parent’s name Andi Pratama or A. Pratama. I also filter the result so it will not compare same student by rejecting same student ID.[/quote]
My hope is; in the near future we can see improvement on Indonesia’s Goverment IT infrastructure and application quality because after reading the source article I can conclude that Indonesia’s Government IT quality is still below standards. Its like seeing a teenanger that still learning how to program application.
Hopefully this article will help you all to understand IT much better.
I can do in 5 minutes what Jokowi’s IT consultant did in 2 weeks by M. Aryo N. Pratama is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at http://www.halilintar.org.
Permissions beyond the scope of this license may be available at http://www.halilintar.org.