jueves, 1 de diciembre de 2016

How to send bulk email to your students using R

In this post I write about a technique I use to send personalized bulk emails to my students at the open and distance education university where I teach (the Universidad Nacional Abierta in Venezuela, or UNA, for short). Being able to mass-email students with personalized messages is should be an important issue in any distance education setup. However, as the UNA does not provide teachers with institutional tools to perform such an outreach, we teachers are left to our own devices. I use R for all sorts of teaching and research tasks, and this was not going to be the exception. That's how I came across an R package for sending emails called mailR, which proved to be the right tool for this task.

In what follows I will give some information of context to explain why mass-emailing would be a necessary strategy for us course facilitators to implement at the UNA. If you are only interested in how to implement this using R, you can skip directly to the R implementation section.

A brief outline of the educational process at the UNA

Perhaps many of you are not aware that the UNA was one of the first open and distance education institutions of higher learning in the world when it was founded in 1977, following Great Britain's Open University, created in 19701. Being very much based on the Open University's instructional and organizational model at the time, it is still pretty much a product of the principles and the approach to distance education prevailing back then. It was designed to bring education to the masses in remote or rural areas of the country and to target working adults who could not comply with the time and place constraints imposed by traditional, presential universities. The mass-produced paper book was the principal medium of instruction delivery. Although other media such as vcr tapes, radio/tv broadcasts were at some point produced or contemplated, their use never really caught on.

The production of these books, and indeed the entire educational process of the UNA, is a typical industrial-age process entailing a division of labor among the academic personel of the institution. On the one hand, you have the content specialists, evaluators and validators at the central level of the institution who are directly involved in the book and course production processes. This personnel is also responsible for creating the exams and oher instruments of assessment each semester. Additionally, they are responsible for scheduling the exam dates of all courses each semester. Content specialists, evaluators and validators do not have contact with students; that is the role of course facilitators, such as myself, who operate at the periphery of the institution in the various regional locations across the length and breadth of Venezuela.

The primary task of the course facilitators is to serve as a bridge between students and the contents of the course embodied in the course textbooks. This means that if, in the process of self-study, the student has issues understanding some contents of the course, he or she contacts the facilitator for individual assistance. Ocassionally, facilitators also hold workshops where groups of students attend. However, attendance to these workshops, ressembling a more traditional learning environment in which a teacher addresses the class to expound some subject, is optional. Grading exams is another task facilitators do, although actually administering the exams is something done by another type of personel - the proctors - who are not necessarily academics.

As can be seen, the distance learning model of the UNA involves a separation of teaching tasks and it is centered around the concept of self-learning. Facilitator-student interaction is predominantly presential. I don't have any statistics to back this up, but I do believe that for the majority of the UNA facilitators, phone interaction with students occurs very rarely and internet interaction (email, forum or video-conferencing) is virtually (no pun intended) inexistent. In this, regard, my presonal case is an outlier. For example, for the july-september trimester, of the total 1643 facilitator-student interaction events, 1605 of those were online, in the form of grade consultations in my online system, messages in my blog's chatbox, email consultations and of course, the bulk emailing system I'll be presenting in this entry.



On the necessity of creating a better communicational platform

As it was made evident in the previous section, the UNA's model of distance education is outdated and needs to be urgently revamped. Unfortunately, I don't see this happening soon from within the institution itself, which maintains a very conservative structure and hierarchy. I've assumed this blog to be my own personal experiment as to how a facilitator like me could create an alternative platform for teacher-student interaction using free or easily available technology. This new structure has to take into account the "natural" communication channels.

By far, Facebook is the most frequently used social network by the student population in Venezuela (and the general population of this country, for that matter). However, as an educational tool, Facebook has its disadvantages: it is a very noisy and distracting communication channel more oriented towards aimless chatter and gossip2. Furthermore, although Facebook is immensely popular, there is a more universal communication channel in the digital era: email.

In 2013, there were approximately 1.19 billion Facebook users and 0.2 billion Twitter users in the world versus 2.5 billion email users3. In spite of seeming less "fashionable" than the popular social media today, email is, and will certainly continue to be, the most widely used digital communication tool, if only because to register for any of these social media, you need to provide an email address! Truisms aside, email does present some important advantages over social media in the educational context: one can be more certain that a student will read a message if we send it by email than if we post it on Facebook, Twitter, Google+, etc. And besides being a less noisy communication medium, email is more personal. Therefore, in the distance education context, email is arguably the most natural communication channel between facilitator and students.

In distance education, distance is both a strength and a weakness of this instructional modality. Distance is an essential attribute of open and flexible education in which the learner's autonomy, time and obligations to family and work are respected. However, too much distance alienates the student from the learning community, which in my opinion accounts for a large portion of dropouts in MOOCs and other distance learning contexts. This is where email comes in. Addressing each student by name and making the student feel that the conversation is about him or her personally works wonders in these contexts, where communication is usually more generical and inclined towards anonymity. Email enables the instructor to create a more personal rapport with the students and making them feel that they are being taken into account personally.

This is the reason why email is the tool of choice for telemarketing and this is why media experts like Michael Hyatt, Anna Hoffman, Neil Patel or Jon Morrow insist that the most important asset of a blog is its list of email subscribers. Surely, telemarketing brings to mind ideas about sales, profits and customers, whereas education is seen as an entirely different phenomenon. At the end of the day, however, we educators are trying to exert influence over our students and if we consider education to be a service to others, then it wouldn't be a bad idea to consider our students as customers. Exerting influence, leadership and using digital communication platforms- these are some of the topics Michael Hyatt discusses in his blog4.

Why do I employ R with the mailR package for bulk emailing instead of other tools?

There are, to be sure, services like MailChimp which allow you to run bulk emailing campaigns. Although MailChimp has a free service that allows you to send up to 12000 emails a month to over 2000 subscribers5 and it seems very easy to set up and get running, for me, it has one major disadvantage: due to DMARC policies, MailChimp cannot send bulk emails from adresses associated to free-service provider domains, like Gmail, Yahoo and Outlook. Surely, academic faculty members of the UNA have our UNA-domain email address (mine is jromero@una.edu.ve). However, I rarely use it and for me, it's more convenient to use my gmail address (it's my natural communication channel). Furthermore, I'm not sure about the bulk-emailing restrictions on my una-domain email address, although in truth, I haven't explored the matter throughly. As for purchasing a hosting/domain+email addresses package adequate enough in terms of server response times and maximum concurrent users online, while I'm sure this is very inexpensive in the rest of the world, it is almost prohibitive for most Venezuelans, whose socialist government has imposed an extremely restricted currency exchange mechanism6.


There is another disadvantage to using ready-made tools like MailChimp, in my case. While addressing recipients by their name is something easily done with MailChimp, I'm not sure if you can easily configure for using male or female gender adjectives (in Spanish, adjectives change in form based on the gender of the thing or person described). Or for example, as a teacher I might be interested in saying something or other on my message based on certain conditions of each individual student (eg. I might advise failing or at risk of failing students to take such and such remedial action). These sort of things require a tool for programmatically tailoring the message to each student. They require using a general programming language like R. At least to me, it seems much easier this way, since I already use R for a whole bunch of other tasks, as this blog testifies to.

Now that I have (I hope) justified my use of R for bulk emailing tasks, I must point out that there are several packages for that. There is sendmailR, mailR and gmailr. The latter package looks interesting and on the author's GitHub site7, there is a tutorial of sorts on how to set it up for sending bulk emails from a Gmail account, coincidentally with an online course application example such as the one I'm discussing in this post. However, I had tried this package before and for some reason, could not get it to work. In deciding between sendmailR and mailR, I considered that mailR's send.mail function has an option for sending pure text or HTML mails, whereare the equivalent sendmailR function had text mail only hardcoded within (this is a bug that they apparently fixed now8). Being able to end HTML was important to me, as I wanted to be able to send nicely formatted emails like this one:

UNIVERSIDAD NACIONAL ABIERTA
CENTRO LOCAL ANZOATEGUI
UA EL TIGRE

Hola PEDRO:

Ante todo quiero desearte éxito en este semestre 2016-1 y particularmente en la primera prueba parcial de la asignatura XXX (código xxx) que presentarás este sábado. El propósito de este mensaje es presentarme: mi nombre es José Romero, soy egresado en la Licenciatura de Matemáticas de la UNA y actualmente, soy el asesor del área de matemáticas en la unidad de apoyo de El Tigre.

Estoy contactando por correo a todos los estudiantes de la asignatura XXX (xxx) de la UNA a nivel nacional para invitarlos a que visiten mi blog.

Huelga decir que en caso de cualquier duda, no dudes en consultarme por el buzón de mensajes del blog o a través de mi correo: jlaurentum@gmail.com. (NOTA: No respondas a este correo ya que es una cuenta para envios automatizados solamente). Estoy a tus ordenes,

Atentamente,


José Romero



Materials needed for this experiment

Besides an R installation with the mailR package, you will need a file with the data of your students: their email addresses, their names, and any other relevant information you wish to convey to them, such as grades, personal feedback for each student, etc. The file needs to be a csv file, which is essentially a text file in which each line is a row of the data table and the fields or columns in each line are separated by a special character such as a comma, a semicolon, or a tab9. If you have your data in a spreadsheet, you can easily convert this to a csv file by using "Save As" and then choosing the "Text/CSV" file type. Indicate the separation character- that will be the same character you indicate when you read in the csv file from R. Your csv file could contain something like this:

id lastname firstname gender c_code una_location email_address 12345678 PEREZ PEDRO M 126 02-01 pedroperezm@dontcare.com 87654321 PEREZ JOSEFINA F 126 02-01 jperez@noneofyourbizwaks.com : : : : : : : : : : : : : :
It is worth noting that with this R/mailR method, you can send up to 100 email messages a day from any of the free email domains such as Gmail, Yahoo or Outlook. If you go beyond this limit, your email account will be temporarily suspended for a day (24 hours), after which you can continue to send messages as usual (never exceeding the 100 emails a day maximum quota). In experimenting with this method, I discovered I had to use a time delay between each email. In the script below, you will see that I uniformly distributed random delays between 3 and 6 seconds. Maybe, by using bigger max/min values and a greater range (spread) for the random delays, you can ensure that the message sending process will proceed 100% smoothly, without having the script interrupted by this error:

Error in ls(envir = envir, all.names = private) : invalid 'envir' parameter
Should you get the above error, you simply have to modify the first and last indexes in the for-loop so that the message sending script will resume the batch sending process from the last student where you left off. The script is the following:

R script for batch emailing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#Batch email sending script
#The following function crafts the message for each student,
#represented by the x parameter
message_text <- function(x) {
  tmp <- paste(c(
    '<!DOCTYPE html>',
    '<html xml:lang="es" lang="es">',
    '<head>',
    '<meta http-equiv="content-type" content="text/html;charset=utf-8" />',
    '<meta name="author" content="José Loreto Romero Palma" />',
    '<STYLE type="text/css">',
    'h1 { text-align: center; font-family: Helvetica, FreeSans; font-size: 40px;',
    'font-variant: small-caps }',
    'h2 { text-align: center; font-family: Helvetica, FreeSans; font-size: 32px;}',
    'h3 { text-align: center; font-family: Helvetica, FreeSans; font-size: 28px;}',
    'p { text-align: justify; font-family: Helvetica, FreeSans; font-size: 16px;',
    'width: 640px}',
    'ul { text-align: justify; font-family: Helvetica, FreeSans; font-size: 16px}',
    'td { font-family : Helvetica, FreeSans; font-size: 12px}',
    '</STYLE>','</head>','<body>',
    paste(c(
    paste0("<table><tbody><tr><td width='60px'>",
    "<IMG SRC='https://lh3.googleusercontent.com/hgDApmnzAf2TrP9lzTUc3U9ZG9",
    "EBaUn9s9OM4DWD4BXtO6j51GCfgLyfyjTdHJ5G8CfXD6_XeipYaQ=w1366-h768-no' ",
    "NAME='logo_UNA' ALIGN='LEFT' WIDTH='51' HEIGHT='51' BORDER='0'>",
    "</td><td width='580px'>UNIVERSIDAD NACIONAL ABIERTA<br/>",
    "CENTRO LOCAL ANZOATEGUI<br/>UA EL TIGRE</td></tr></tbody></table>"),
    paste0("<br/>"),
    paste0("<p>Hola ",x$firstname," ",x$lastname,":</p>"),
    paste0("<p>",
    "Ante todo quiero desearte éxito en este semestre 2016-1 y ",
    "particularmente en la primera prueba parcial de la asignatura ",
    "XXX (código xxx) que presentarás ",
    "este sábado.  El propósito de este mensaje es ",
    "presentarme: mi nombre es <b>José Romero</b>, soy egresado en la ",
    "Licenciatura de Matemáticas de la UNA y actualmente, soy el ",
    "asesor del área de matemáticas en la unidad de apoyo de El Tigre.</p>"),
    paste0("<p>",
    "Estoy contactando por correo a todos los estudiantes de la asignatura ",
    "XXX (xxx) de la UNA a nivel nacional para invitarlos a que ",
    "visiten mi <a href='http://unamatematicaseltigre.blogspot.com'",
    " target='_blank'>blog</a>.</p>",
    "<p>Huelga decir que en caso de cualquier duda, no dudes en ",
    "consultarme por el buzón de mensajes del blog o a través de mi ",
    "correo: <a href=\"mailto:jlaurentum@gmail.com\">jlaurentum@gmail.com</a>.",
    " (NOTA: No respondas a este correo ya que es una cuenta para envios ",
    "automatizados solamente). Estoy a tus ordenes,</p>"),"","",
    "<p>Atentamente,</p>","<br />","<p>José Romero</p>")),
    '</body>','</html>'),
    collapse="")
    return(tmp)
}

library(mailR)
#note: turn on/off less secure apps access for gmail (?)
#note: the daily quota is 100 emails 
#read in the mailing list in the csv
mail_list <- read.csv2("roster.csv",as.is=TRUE)
file_h <- file('test.html',open="w")
writeLines(message_text(mail_list[1,]),file_h)
close(file_h)
ext_f <- file("mail.out",open="w")
close(ext_f)
for (recipient in 1:nrow(mail_list)) {
  email <- message_text(mail_list[recipient,])
  send.mail(from="mygmail@gmail.com",
    to=as.character(lista_correos[recipient,]$email_address),
    subject="Invitation to the unamatematicaseltigre blog",
    body=email,
    html=TRUE,
    authenticate=TRUE,
    smtp = list(host.name = "smtp.gmail.com",
    user.name = "mygmail", passwd = "mypasswd", ssl = TRUE),
    encoding = "utf-8",send=TRUE)
  print(mail_list[recipient,])
  Sys.sleep(runif(n=1,min=3,max=6))
  #write each recipient to a file
  ext_f <- file("mail.out",open="a")
  writeLines(text=paste0("[",recipient,"] ",
    paste0(as.character(mail_list[recipient,]),collapse="\t")),
    sep="\n",con=ext_f)
  close(ext_f)  
}


Lines 4-52 define a function - message_text that builds the HTML message body for a given recipient. This script produces a message body such as the one shown in the example above. Notice the meta tags in the header defining the CSS to give format to your message and the ability to include images like the UNA logo in line 23-25.

Line 58 reads in the roster.csv file as a data frame placed in the mail_list variable. The read.csv2 function assumes your separation character is a semicolon (;) and the decimal point is the , (this is so in spanish speaking countries). However, you can configure other settings by using the read.table or read.csv functions. Lines 59-61 simply write a test message to an HTML file. I usually run the script up to these lines first to ensure that the message text comes out like I want it before starting the batch sending processs. The for loop in lines 64-83 is responsible for batch emailing to all the students in the roster data frame.

The index variable recipient of the for loop (line 64) is an integer going from 1 to the number of rows in the roster data frame. Bear in mind that the daily quota is 100 emails a day if you are emailing from a free email account like gmail or yahoo. Therefore, if your roster file consists of more than 100 students, you will want to split this up into groups of 100 students, batch sending for each group on different days. Besides, there may be problems while emailing to a particular address that cause the script to stop with an error (the Error in ls(envir = ... mentioned earlier). Therefore, you need some way of keeping track of the emails that you send.

This is the reason I create a file (lines 62-63) where I will be writing the information of each individual row in the roster as I send each message (lines 78-82). If the script execution stops for some reason (besides the error mentioned above, power blackouts are quite common in Venezuela), I can use this file to see from what row of the roster I should resume the batch sending. Besides, it's always a good idea to keep a record of all emails sent. While running this code, If you open the email client from a browser, you will see how the Sent email box starts to populate with messages as the script sends them. If for some reason the postmaster cannot send to a certain email address, you will get a notification email in your Inbox.

The actual email sending is done via the send.mail function of the mailR package in lines 66-74. In this example, I'm sending from a fake gmail account: mygmail@gmail.com. The user name before the ad sign of your gmail address is the one you will pass to the user.name parameter in line 73. In the same line of code, you also have to indicate the password you use to login to your email account as parameter to passwd. The host name for gmail addresses is smtp.gmail.com, but if you use yahoo or some other free email provider, you have to find what the smtp host name is for that provider and indicate it in the host.name parameter at line 72. A Google lookup should suffice.

As already mentioned, I introduce a random delay between each call to the send.mail function. This is so my gmail client won't suspect I'm sending emails in "automatic pilot mode" and my account won't be temporarily suspended. For me, random delays between 3 and 6 seconds worked fairly well, but you may want to experiment with higher delay values to be safe.

Finally, if you have any questions or comments, feel free to let me know in the comment section of this post. I'd be happy to answer them.

Notes

  1. See McIsaac and Gunawardena (2002) for an account on the history and theoretical constructs behind distance education.
  2. See What Is Google+? (An African Perspective), an interesting post by Rotimi Orimoloye for his blog Digital Africa. In it, he argues that Nigerians, who also use Facebook more often than any other social network, should transition into Google+, the latter being more suitable for getting to know who the experts in a particular field are and to engage in learning about these fields.
  3. See the statistics in the FAQ section of Specific Feeds: https://www.specificfeeds.com/page/faq-email-publishers.
  4. Michael Hyatt, author of a book titled "Platform: Get Noticed in a Noisy World", is a expert in this subject. I strongly reccomend visiting his blog https://michaelhyatt.com.
  5. While there are experts on the subject of platforms like Michael Hyatt and the others I have mentioned, I think that out of necessity, I'm on my way to becoming an expert myself on the subject of building a platform with free or freely available tools. I believe that while technology is in some cases widening the gap between the rich and the poor, free and open source technology holds enormous potential as empowering tools for people who, like myself, live in countries with failed economies.
  6. See Premraj, 2014.
  7. Hence the acronym CSV: Comma Separated Values. The separation character can be any character you choose. For this example, we will assume the semicolon (;) is the separation character.

Bibliography




If you found this post interesting or useful, please share it on Google+, Facebook or Twitter so that others may find it too.


drive