En esta entrada, haremos el ejercicio de regresión lineal planteado en el semestre 2012-1, cuyo trabajo práctico es igual al de este semestre (2017-1). Se hace uso del R y mi librería estUNA para construir modelos de regresión lineal mediante el método de eliminación hacia atrás. En este proceso, enfatizamos la importancia de realizar un análisis de residuos, entre otras cosas para sugerirnos posibles transformaciones de las variables con las que podamos mejorar los modelos de regresión.
lunes, 1 de mayo de 2017
jueves, 6 de abril de 2017
estUNA
¿Qué es estUNA?
Eventualmente será publicada en el repositorio CRAN como un paquete. Actualmente, el archivo imagen (que permite trabajar con la librería) está disponible para su descarga en https://raw.githubusercontent.com/unamatematicaseltigre/estUNA/master/estUNA.
Introducción al R
es un entorno de programación
En esta página se hará una brevísima introducción al lenguaje R como entorno de programación. Sin pretender que esto sea una guía completa, se exponen los conceptos necesarios para poder utilizar este lenguaje como complemento instruccional a los cursos de estadística y probabilidades de la Universidad Nacional Abierta.
Un entorno de programación es una aplicación que permite crear, ejecutar y depurar programas. Los programas son esencialmente secuencias de instrucciones que le indican al computador de manera muy precisa lo que este debe hacer. Estas instrucciones se especifican en algo llamado lenguaje de programación y cada lenguaje de programación tiene su "gramática" particular y sus reglas de sintaxis. R es un lenguaje de programación interpretado, lo cual quiere decir que el programador ingresa instrucciones a través de una consola y el interprete de R va procesando cada instrucción a medida que esta se ingresa y va dando la salida respectiva a cada instrucción de forma secuencial.
En vez de escribir las instrucciones una por una en la consola, podemos indicar la secuencia de instrucciones que queremos ejecutar a través de un archivo de texto (cómo los que creamos cuando usamos el bloc de notas). Esto es lo que se conoce como un script. Un script es una especie de programa que necesita siempre de un interprete para poderse ejecutar. En esta guía, aprenderemos a crear nuestros propios scripts.
viernes, 23 de diciembre de 2016
Feliz Navidad 2016 (en R)
plot3D
de R:jueves, 22 de diciembre de 2016
Merry Christmas 2016 (with R)
plot3D
R package:Mathematical model of a Christmas tree
- The starting point of the trunk segment, given as a vector with coordinates in \(\mathbb{R}^3\) as \((x_0,y_0,z_0)\).
- The ending point of the trunk segment, as given by the vector with coordinates \((x_1,y_1,z_1)\). Together, the start and end points determine the direction vector of the tree stem as \(\vec{u}=(x_1-x_0,y_1-y_0,z_1-z_0)\). This direction vector will be useful for creating the extension stub, since the extension stub grows in the same direction as the parent stub. It is also used for determining where along the stub the branches start off and in what direction those branches are created.
- The width parameter
lwd
which is also the thickness with which the tree stem is drawn as a segment when plotted. - The depth, which indicates how outward a branch or tree stem is, as already explained above.
- Slots for three branches and one extension (
branch1
,branch2
,branch3
andextension
), which are nothing but lists recursively defined like this one. When a branch or extension is created, these slots are initialized toNULL
.
Fig. 3a 3 branch defining vectors | Fig. 3b 2 branch defining vectors |
Fig. 3 Distribution of two and three branch generating vectors on \(\mathcal{V}\) |
- 2 branches: We first consider \((1,0)\) as the first vector and we determine the second unit-length vector by choosing a random angle between \(160^\circ\) and \(200^\circ\). We choose another random angle between \(0^\circ\) and \(360^\circ\) to rotate the entire set of two vectors.
- 3 branches: Our three vectors will initially be \((1,0)\), \((\tfrac{\sqrt{3}}{2},-\tfrac{1}{2})\), and \((-\tfrac{\sqrt{3}}{2},-\tfrac{1}{2})\). We then choose a random angle between \(0^\circ\) and \(360^\circ\) to rotate the entire set of three vectors.
R script for the 3d Christmas tree
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
Bibliography
- Apostol, T. (1985). Calculus, Vol. II (2nd ed). (F. Veléz trans.). Caracas: Editorial Reverté.
- carlop. (April 26, 2012). Answer to "How to find vector perpendicular to another vector?" on math.stackexchange. [Retrieved 12/20/2016 from http://math.stackexchange.com/questions/137362/how-to-find-perpendicular-vector-to-another-vector].
- Cooney, B. (1967). Christmas. New York: Thomas Y. Crowell Company.
- Duineveld, K. (2013, December). "Merry Christmas". Blog post. [Retrieved 12/20/2016 from http://wiekvoet.blogspot.com/2013/12/merry-christmas.html].
- R DEVELOPMENT CORE TEAM (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
- Soetaert, K. (2016). plot3D: Plotting multi-dimensional data. R package version 1.1. https://cran.r-project.org/web/packages/plot3D/index.html.
Dear Reader:
If you found this post interesting or useful, please share it on Google+, Facebook or Twitter so that others may find it too.
Below you will find a link to download a printable pdf version of this article for off-line reading and the R source code featured in this post, but essentially it's the same information I'm sharing with you in this blog entry. It will take you to a Bitcoin payment gateway which is very secure, anonymous and hassle free (I will not collect your sensitive personal information). The fee is small - 0.0005 BTC - or roughly $0.45 at today's rate. Think of this as a donation, not a business transaction that will help me continue writing content and survive in this hell-hole that Venezuela has become. Thanks in advance!
jueves, 1 de diciembre de 2016
How to send bulk email to your students using R
mailR
, which proved to be the right tool for this task.A brief outline of the educational process at the UNA
On the necessity of creating a better communicational platform
Why do I employ R with the mailR package for bulk emailing instead of other tools?
sendmailR
, mailR
and gmailr
. The latter package looks interesting and on the author's GitHub site7, there is a tutorial of sorts on how to set it up for sending bulk emails from a Gmail account, coincidentally with an online course application example such as the one I'm discussing in this post. However, I had tried this package before and for some reason, could not get it to work. In deciding between sendmailR
and mailR
, I considered that mailR
's send.mail
function has an option for sending pure text or HTML mails, whereare the equivalent sendmailR
function had text mail only hardcoded within (this is a bug that they apparently fixed now8). Being able to end HTML was important to me, as I wanted to be able to send nicely formatted emails like this one:UNIVERSIDAD NACIONAL ABIERTA CENTRO LOCAL ANZOATEGUI UA EL TIGRE |
Hola PEDRO:
Ante todo quiero desearte éxito en este semestre 2016-1 y particularmente en la primera prueba parcial de la asignatura XXX (código xxx) que presentarás este sábado. El propósito de este mensaje es presentarme: mi nombre es José Romero, soy egresado en la Licenciatura de Matemáticas de la UNA y actualmente, soy el asesor del área de matemáticas en la unidad de apoyo de El Tigre.
Estoy contactando por correo a todos los estudiantes de la asignatura XXX (xxx) de la UNA a nivel nacional para invitarlos a que visiten mi blog.
Huelga decir que en caso de cualquier duda, no dudes en consultarme por el buzón de mensajes del blog o a través de mi correo: jlaurentum@gmail.com. (NOTA: No respondas a este correo ya que es una cuenta para envios automatizados solamente). Estoy a tus ordenes,
Atentamente,
José Romero
Materials needed for this experiment
mailR
package, you will need a file with the data of your students: their email addresses, their names, and any other relevant information you wish to convey to them, such as grades, personal feedback for each student, etc. The file needs to be a csv file, which is essentially a text file in which each line is a row of the data table and the fields or columns in each line are separated by a special character such as a comma, a semicolon, or a tab9. If you have your data in a spreadsheet, you can easily convert this to a csv file by using "Save As" and then choosing the "Text/CSV" file type. Indicate the separation character- that will be the same character you indicate when you read in the csv file from R. Your csv file could contain something like this:id lastname firstname gender c_code una_location email_address
12345678 PEREZ PEDRO M 126 02-01 pedroperezm@dontcare.com
87654321 PEREZ JOSEFINA F 126 02-01 jperez@noneofyourbizwaks.com
: : : : : : :
: : : : : : :
Error in ls(envir = envir, all.names = private) :
invalid 'envir' parameter
R script for batch emailing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
message_text
that builds the HTML message body for a given recipient. This script produces a message body such as the one shown in the example above. Notice the meta tags in the header defining the CSS to give format to your message and the ability to include images like the UNA logo in line 23-25.mail_list
variable. The read.csv2
function assumes your separation character is a semicolon (;) and the decimal point is the , (this is so in spanish speaking countries). However, you can configure other settings by using the read.table
or read.csv
functions. Lines 59-61 simply write a test message to an HTML file. I usually run the script up to these lines first to ensure that the message text comes out like I want it before starting the batch sending processs. The for
loop in lines 64-83 is responsible for batch emailing to all the students in the roster data frame.recipient
of the for
loop (line 64) is an integer going from 1 to the number of rows in the roster data frame. Bear in mind that the daily quota is 100 emails a day if you are emailing from a free email account like gmail or yahoo. Therefore, if your roster file consists of more than 100 students, you will want to split this up into groups of 100 students, batch sending for each group on different days. Besides, there may be problems while emailing to a particular address that cause the script to stop with an error (the Error in ls(envir = ...
mentioned earlier). Therefore, you need some way of keeping track of the emails that you send.send.mail
function of the mailR
package in lines 66-74. In this example, I'm sending from a fake gmail account: mygmail@gmail.com. The user name before the ad sign of your gmail address is the one you will pass to the user.name
parameter in line 73. In the same line of code, you also have to indicate the password you use to login to your email account as parameter to passwd
. The host name for gmail addresses is smtp.gmail.com
, but if you use yahoo or some other free email provider, you have to find what the smtp host name is for that provider and indicate it in the host.name
parameter at line 72. A Google lookup should suffice.send.mail
function. This is so my gmail client won't suspect I'm sending emails in "automatic pilot mode" and my account won't be temporarily suspended. For me, random delays between 3 and 6 seconds worked fairly well, but you may want to experiment with higher delay values to be safe.Notes
- See McIsaac and Gunawardena (2002) for an account on the history and theoretical constructs behind distance education.
- See What Is Google+? (An African Perspective), an interesting post by Rotimi Orimoloye for his blog Digital Africa. In it, he argues that Nigerians, who also use Facebook more often than any other social network, should transition into Google+, the latter being more suitable for getting to know who the experts in a particular field are and to engage in learning about these fields.
- See the statistics in the FAQ section of Specific Feeds: https://www.specificfeeds.com/page/faq-email-publishers.
- Michael Hyatt, author of a book titled "Platform: Get Noticed in a Noisy World", is a expert in this subject. I strongly reccomend visiting his blog https://michaelhyatt.com.
- While there are experts on the subject of platforms like Michael Hyatt and the others I have mentioned, I think that out of necessity, I'm on my way to becoming an expert myself on the subject of building a platform with free or freely available tools. I believe that while technology is in some cases widening the gap between the rich and the poor, free and open source technology holds enormous potential as empowering tools for people who, like myself, live in countries with failed economies.
- See Premraj, 2014.
- Hence the acronym CSV: Comma Separated Values. The separation character can be any character you choose. For this example, we will assume the semicolon (;) is the separation character.
Bibliography
- McIsaac, M. S. and Gunawardena, C. N. (2002). Distance Education. [Retrieved November 26 , 2016 from http://www.aect.org/edtech/ed1/pdf/13.pdf.
- WTFPL. (10/18/2016). In Wikipedia, The Free Encyclopedia. [Retrieved 11/15/2016 from: https://en.wikipedia.org/wiki/WTFPL].
- R DEVELOPMENT CORE TEAM (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
- Premraj, R. (2014). How to send HTML email using R. [Question on Stackoverlow available at: http://stackoverflow.com/questions/19844762/how-to-send-html-email-using-r]..
- YETESOFT.COM. (2015). Email Sending Limit and Send Rate - Gmail, Hotmail, Yahoo! Mail, AOL. [Available at: http://www.yetesoft.com/free-email-marketing-resources/email-sending-limit/].
If you found this post interesting or useful, please share it on Google+, Facebook or Twitter so that others may find it too.
domingo, 20 de noviembre de 2016
Telemercadeo por correo electrónico usando R
miércoles, 5 de octubre de 2016
R Puzzle No. 2 - Where's the zebra?
The Puzzle
The Zebra Eye, by Singapore artist Jarrell Goh aka fuzzyzebra (2007). |
The Zebra puzzle, or Einstein's riddle as it is often called, was popularized in a 1962 number of Life International magazine, with a follow-up in a March 1963 issue of Life containing the solution. It is said to have been invented by a young Einstein, although it is also credited to Lewis Carroll. The truth is, there is no clear evidence of either's authorship of the riddle. I should also point out that various versions of the puzzle exist, with different nationalities, brands of cigarettes, or even replacing sports teams with cigarettes, etc.
The version of the puzzle we will be using here is as follows. There are five houses along a street whose five occupants are all of different nationalities (English, German, Norwegian, Swedish, and Danish). Each occupant also smokes a different brand of cigarette (five brands in all), has a different favorite drink, has a different pet and owns a house of a different color than his neighbors. The following clues are given:
|
- The Englishman lives in the red house.
- The Swede has a dog.
- The Dane drinks tea.
- The man in the green house drinks coffee.
- The white house on the left of the green house.
- The man in the yellow house smokes Dunhill.
- The man who smokes Pall Mall has a bird.
- The German smokes Prince.
- The man who drinks beer smokes Bluemaster.
- The man that smokes Blends lives next to the man with a cat.
- The man with a horse lives next to the one that smokes Dunhill.
- The man who smokes Blends lives next to the man that drinks water.
- The Norwegian lives next to blue house.
- The man in the middle house (third house) drinks milk.
- The Norwegian lives in first house.
A mathematical approach
$
2000 and for exterior paint it is $
3000. How much interior and exterior paint should the factory produce to maximize gross income?\[A=\left(\begin{array}{cc} 1 & 2\\ 2 & 1\\ -1 & 1\\ 0 & 1\\ \end{array}\right)\]
With \(X_E, X_I\geq 0\). Alternatively, in non-matrix form, our problem would be to maximize \(z=3000X_E+2000X_i\) subject to the following restrictions:
\[\begin{align*} 1X_E + 2X_I &\leq 6\\ 2X_E + 1X_I &\leq 8\\ -1X_E + 1X_I &\leq 1\\ 0X_E + 1X_I &\leq 2 \end{align*}\]
Again, subject to the non-negativity restrictions on \(X_E\) and \(X_I\).
The lpSolve package
1 2 3 4 5 6 7 8 |
|
Success: the objective function is 12666.67
[1] 3.333333 1.333333
A
. In the next line, we give names to the columns that correspond to the decision variables. This is not necessary, but the user must be aware that the solution vector will be given in terms of these variables in the order that they appear on the constraints matrix. Notice that the lp
function call in line 7 requires we specify whether we want a maximum or minimum for the objective function, in this case given by the vector z
defined in line 2. A neat feature of the lp
function is that for each constraint row of the constraints matrix, we can specify a direction for the inequality as "<="
, "="
or ">="
, so not all restriction inequalities have to be of the same type.lp
function in lpSolve can handle any of those three types of restrictions for each constraint separately.lp
can handle these cases and even the more restricted cases where the decision variable is binary (either 0 or 1). As we shall see, it is very easy to indicate which variables we want to be real, integer, or binary. So on that aspect of our zebra problem requiring that our variables be restricted to integer values (1,2,3,4 or 5) indicating the house where each item is, we have nothing to worry about.Not so fast, cowboy
variable_names <- c("W","X","Y","Z")
row1 <- rep(0,4)
names(row1) <- variable_names
row1["X"] <- 1
row1
W X Y Z
0 1 0 0
row1
contains the coefficients for the constraint. The direction of that constraint would be "="
, and the corresponding element of the right-hand side vector of the (in this case) equality would be 3. Now say we want to indicate that "W and Z are in the same house". That would be equivalent to stating that \(W=Z\) or \(W-Z=0\). In that case we would add a second row to are constraint matrix as follows:row2 <- rep(0,4)
names(row2) <- variable_names
row2["W"] <- 1
row2["Z"] <- -1
row2
W X Y Z
1 0 0 -1
row1
, the direction (sign) of that constraint would again be "="
and the corresponding right-hand side value would be 0, since we want \(W-X=0\). You may be thinking that perhaps this is a complicated way to do this, but consider that with 25 variables per linear constraint where most of the coefficients are zero, it makes a lot of sense to set the coefficients only of one or two variables we are dealing with. In effect, we are seeing that the constraints matrix will be a sparse matrix (mostly filled with zeros). Constraints like "W is to the right of Y" are mathematically expressed as \(W-Y=1\) and it wouldn't be too hard to see how that constraint row would be defined in R."<="
, "="
and ">="
?And finally, the R solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
House 1 House 2 House 3 House 4 House 5
Nationality "Norwegian" "Dane" "English" "Sweede" "German"
Drinks "Water" "Tea" "Milk" "Beer" "Coffee"
Smokes "Dunhill" "Blend" "Pallmall" "Bluemaster" "Prince"
Pet "Cat" "Horse" "Bird" "Dog" "Zebra"
House color "Yellow" "Blue" "Red" "White" "Green"
Notes
- There is a thorough list of coding implementations (with source code in various languages) of the Zebra Puzzle at the rosettacode.org site (see WILLNESS et al).
- See BERKELAAR (2015).
- See BERKELAAR (2010).
Bibliography
- BERKELAAR, M. et al. (2010). lp_solve 5.5 reference guide: Absolute values. http://lpsolve.sourceforge.net/5.5/asolute.htm.
- BERKELAAR, M. et al. (2015). Interface to 'Lp_solve' v. 5.5 to Solve Linear/Integer Programs. R package "lpSolve". http://cran.r-project.org/.
- GOH, J. (2007) The Zebra eye [Digital visualization]. Retrieved from http://fav.me/d11cbfy.
- R DEVELOPMENT CORE TEAM (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
- TAHA, H. (1982). Operations Research - An Introduction. 3rd Edition. New York: MacMillan Publishing Co.
- WILLNESS et al. (2011). Zebra Puzzle. https://rosettacode.org/wiki/Zebra_puzzle..
- Zebra Puzzle. (2016, August 18). In Wikipedia, The Free Encyclopedia. Retrieved 17:01, October 1, 2016, from https://en.wikipedia.org/w/index.php?title=Zebra_Puzzle&oldid=735058353
If you found this post interesting or useful, please share it on Google+, Facebook or Twitter so that others may find it too.