Spammers had an oppurtunity mining 10 lakh plus email IDs of people’s response on net neutrality after TRAI made them public in its website last month. But how do they do it exactly? Read on to know more …
On April 27, 2015, Telecom Regulatory Authority of India (TRAI) made public all the responses to its paper on the need to regulate OTTs like WhatsApp, and other services available for the public pertaining to net neutrality. TRAI had earlier published a ‘Consultation Paper On Regulatory Framework for Over-the-top (OTT) services’ on 27th March, 2015, on its website requesting the stakeholders to send their comments on email id. email@example.com by 24th April, 2015. TRAI had put up a consultation paper on its website asking users to give their views on net neutrality in India. The consultation paper on net neutrality asked the public a total of 20 questions about the topic and whether OTT services like WhatsApp and Skype should pay extra for data consumed by users.
Based on the request, TRAI has received more than 10 lakh responses to its paper and has now put out all the responses, including email ids on its website. TRAI has published “Comments received from the stakeholders towards the Consultation paper on Regulatory Framework for OTT services”. TRAI has published all the email responses by sorting the comments into three main sections
If you have sent your email response to TRAI, here’s how you can spot your email address in the TRAI website. First, visit the TRAI web link http://trai.gov.in/Comments/Comments-List003.pdf. Here you can view the email responses arranged and listed by date-wise and then you can select any date period; for instance if you click http://www.trai.gov.in/Comments/14-April-2/14-April-2.html, you will be lead to a spreadsheet of emails and their corresponding email addresses. Now you can specifically view an email of a respondent http://www.trai.gov.in/Comments/14-April-2/Kailash%20Diengdoh-20150413-2306349049.html.
This move has led to widespread criticism of TRAI, slamming the organization for betraying the trust of the people and compromising their privacy.
The TRAIs act of making public all the email IDs of people has made it easy for spammers to get a huge database of email IDs in one go. The spammers and database sellers now have email IDs of a cohort of 10 lakh plus Indians who (largely) support net neutrality. Actually, it’s not just the email IDs of respondents which can be compromised – there are several email signatures containing phone numbers, mobile numbers, official or residential address. So, in a way all these crucial personal information is a definite bonanza for every spammers.
In this digital age where email IDs are linked to accounts like banks, insurance and financial institutions and social media platforms like Twitter, Facebook, Linkedin and others, exposing so many IDs and carries many serious consequences. Technically speaking, all the email IDs in the website http://trai.gov.in/Comments/Comments-List003.pdf is a jackpot for the spammers.
As the public, social networking media and news media criticism mounted, TRAI on April 30, 2015, modified the email IDs of all stakeholders who had responded to its consultation paper. TRAI took to ‘munging’ of all addresses to deter spammers from automatically copying these email IDs. In address munging process, the email IDs are changed by replacing symbols like @ to (at) and a ‘.’ to (dot). But again, TRAI should have realized that this exercise is futile as any noob spammers can still easily write a script and get the required email IDs.
So, how does one write a script to parse all the 10 lakh plus email from the TRAI website?
Let’s have a look at some of the scripts to know how to parse email IDs from TRAI website
Parse and Spam
As I was browsing the Internet, I found 3 scripts written by various programmers in Python, Java and R script. The common objective of the 3 scripts is to extract the email IDs from the TRAI website and help spammers use them for spamming. I have listed these codes in the following sections
Obviously, for the sake of security — I have not listed the complete code as they may help spammers to misuse them.
The following python script extracts all the email IDs from the websites listed under http://trai.gov.in/Comments/Comments-List003.pdf.
response = requests.get(u)
tree = html.fromstring(response.text)
data = tree.xpath(‘//tr/td/text()’)
for element in data:
if “(at)” and “(dot)” in element:
final = parseaddr(element.replace(“(at)”,”@”).replace(“(dot)”,”.”))
with open(“test.txt”, “a”) as myfile:
if “@” and “.” in final:
myfile.write(final + “\n”)
counter2 += 1
The following Java script scans for .html files in current directory and extracts name and email address from it.
var currentFile = fileList[fileCount];
console.log(“parsing file.. ” + currentFile);
fs.readFile(currentFile, ‘utf8’, function (err, data)
var trData = data.split(“</tr>”);
for(var i = 0; i < trData.length; i++)
var tdData = trData[i].split(‘</td>’);
var finalData = tdData
finalData = finalData.replace(“<td>”, “”);
finalData = finalData.replace(“>”, “”).trim();
finalData = finalData.split(” <”);
var nameData = finalData;
var emailData = finalData;
The following R script is used to parse, clean and write TRAI emails into a CSV file.
mail.list.df$V2 <- sub(pattern = “\\([a-z]+\\)”, replacement = “.”, x = mail.list.df$V2)
mail.list.df$V2 <- sub(pattern = “>”, replacement = “”, x = mail.list.df$V2)
## Naming the columns properly
colnames(mail.list.df) <- “name”
colnames(mail.list.df) <- “email”
## Write the names and emails to a csv file
write.table(x = mail.list.df, file = “TRAI-Email-list.csv”, sep = ‘,’, quote = FALSE, row.names = FALSE, append = TRUE)
A Brief Conclusion
It is disappointing that Telecom Regulatory Authority of India has published more than 10 lakh email addresses of people who emailed them in support of net neutrality. There is absolutely no doubt that TRAI posting the email responses containing personal of the respondent’s details online has now created a treasure information for spammers and phishers alongside putting your confidential data at risk. An agency which is entrusted with regulation of telecom sector in India is expected to have more responsibility and sensibility and not make public these email IDs.
For the people who have signed for net neutrality and sent their responses to TRAI should not be surprised if they receive any spam messages in their email inbox or spam folder as their email id along with other personal information like mobile phone numbers or other key information has been made public by TRAI. In a country like India, where digital literacy is still an enigma, and awareness about online security and privacy is low — such carelessness by TRAI will only increase cyber-crime. The truth is that in the name of ‘transparency’ by TRAI, the privacy of people is at stake.
The author is a Senior Editor at Bitstream Mediaworks.
He has an active interest in IT Security.