/media/bill/HOWELL_BASE/Qnial/MY_NDFS/code develop_test/email - extract, sort, cull addresses- develop.txt www.BillHowell.ca 08Jan2018 initial - to better track changes ******************* ToDos : 24Oct2018 bash scripts : replace d_temp="/media/bill/ramdisk/" with expression that includes RaspPi 24Oct2018 "add_keys" - were "new key emails" from cumulative current [WCCI,IJCNN,INNS-BDDL] mass emails added? ******************************************* ***************** 24Oct2018 Do "Actions - Entry and processing of emails from responses in spreadsheet “INNS mass email list.ods”" OK - lexicom.ca IJCNN list 14,463 emails ******************** 24Oct2018 Key emails extraction & processing - modify directories of : "email - extract, sort, cull addresses from text.ndf" +-----+ 1. bash files!!! d_email_scripts := '/media/bill/SWAPPER/bin/email scripts/' ; $ ls -1 "/media/bill/SWAPPER/bin/email scripts/" 0_badAss_archive.sh 0_key_emails_archive.sh 0_key_emails_list.sh 0_moved_archive.sh 0_remove_archive.sh 1_remove list for PubCom.sh archive.sh email - process folders.sh Link to email - extract, sort, cull addresses from text.ndf Link to Website rsync update.sh >> don't need archive scripts any more! >> "email - process folders.sh" # Archives the "Remove my emails" list. >> don't need it anymore either >> "1_remove list for PubCom.sh" I'll need this IF I do IEEE electronic Copyright form" (not planned for IJCNN2019) ... leave it for now ... After changes : $ ls -1 "/media/bill/SWAPPER/bin/email scripts/" 0_key_emails_list.sh 1_remove list for PubCom.sh Link to email - extract, sort, cull addresses from text.ndf Link to Website rsync update.sh z_Archive I corrected "0_key_emails_list.sh" : changed Midas to PROJECTS +-----+ 2. "# add_keys" & "# keyEmails processing" sections NONLOCAL d_QNial_temp d_keys p_keys ; # define in "setup" section of this file : d_keys := link d_emails 'key emails extraction/' ; f_keymails := '0_key_emails_list.txt' ; p_keys := link d_keys f_keymails ; >> looks OK! qnial> add_keys /media/bill/SWAPPER/bin/email scripts/0_key_emails_list.sh: line 24: cd: /media/bill/Midas/a_INNS Lexicom email server/key emails extraction/: No such file or directory ls: cannot access mass emails/: No such file or directory cat: 0_key_emails_list.txt: No such file or directory cat: 1_contact list.txt: No such file or directory cat: /media/bill/ramdisk/key emails - cumulative emails.txt: No such file or directory >> OOPS! need to change scripts first! After changes as above : qnial> add_keys cat: mass emails/z_Old: Is a directory >> OK - it seemed to run, but were "new key emails" from mass emails added? (probably not - just OrgComm emails) For now, I'm in a rush - check for future mass email +-----+ 3. "4. Process mass email folders" >> Looks OK as I ran these yesterday! (big relief) BUT must re-run AFTER doing key emails processing! Check for host commands!! - might have to modify other scripts? >> No, so I should be OK (would have seen a problem yesterday, I would think) Delete recent output files, re-run : qnial> process_folders running "badass" +----------------------------------------+---+ |remove_key_emailAddresses email_count : |283| +----------------------------------------+---+ running "movex" +----------------------------------------+--+ |remove_key_emailAddresses email_count : |46| +----------------------------------------+--+ running "remex" +----------------------------------------+-+ |remove_key_emailAddresses email_count : |2| +----------------------------------------+-+ >> OK - manuual cleanup & check files remove end-of-line ".)]" and start-of-line "[" ******************** 23Oct2018 Bad addresses especially - problem with inclusion of parenthesis, emails ending with "." Actually, my adaptations for the new "email - split Thunderbird email folder.ndf" are working very well! Was fast to adapt! email_clean : % 14Feb2018 for now, just grab the parenthesized emails - split them later ; >> oops - just do manually tonight! Don't do "moved" emails at this time!! ******************* 08Jan2018 Bad Address extraction from folder - I am only getting a small percentate (1/4) of the total. WHY? +---+ random examples of emails that - are in "Bad addresses" folder - appear in "Bad address 3 sorted and culled.txt" - do NOT appear in "Bad address 4 no server emails.txt" denis.hamad@lasl.univ-littoral.fr gonda@me.his.u-fukui.ac.jp jahan@synapse.his.fukui-u.ac.jp shirvani@math.uni-muenster.de yoneda@nn.csse.yamaguchi-u.ac.jp >> AH-HAH! It appears that NO hyphenated emails are included! +---+ Looking at "/media/bill/Midas/a_INNS Lexicom email server/Bad address processing/" : This appears in It Why? In "email - extract, sort, cull addresses from text.ndf" +---+ file_remove_textSetLines IS OP fin_name fot_name text_set { LOCAL line lines ; % ; fin := open fin_name "r ; fot := open fot_name "w ; WHILE (not isfault (line := readfile fin)) DO IF NOT (find_strings text_set line) THEN flag_length := (gage shape line) < 60 ; flag_dash := l ; IF (~= null (hyph_count := (gage shape (find_string '-' line)))) THEN flag_dash := hyph_count < 3 ; ENDIF ; flag_ampersand := l ; IF (~= null (amp_count := (gage shape (find_string '@' line)))) THEN flag_ampersand := amp_count < 2 ; ENDIF ; flag_start := l ; IF (in (first line) '.@''/') THEN flag_start := o ; ENDIF ; IF (AND flag_length flag_dash flag_ampersand flag_start) THEN writefile fot line ; ENDIF ; ENDIF; ENDWHILE; EACH close fin fot ; } +---+ >> This looks entirely OK, but apparently it isn't!! >> The ONLY call to other email processing operators is "find_strings text_set line" >> text_set is email_line_exclude : email_line_exclude := 'ijcnn' '@listserver2.lexi.net' '@localhost' 'postmaster' 'mailer-daemon' '>' 'mailto:' 'no-reply' '+' '=' '%' '$' 'billhowell' ; +---+ In "/media/bill/HOWELL_BASE/Qnial/MY_NDFS/strings.ndf" find_string IS OPERATION Substr Str { Position := first Substr findall Str ; (Substr EACHRIGHT = ( tally Substr EACHRIGHT take ( Position EACHLEFT drop Str ) ) ) sublist position } test : a := 'There is a big bird flying to bird land for the birds' find_string 'bird' a 15 30 48 >> OK, this is fine find_string '-' a >> OK, gave a null Perhaps ANY SubStr shorter than Str is rejected? find_string a '-' null find_string 'doggy' 'dog' null +---+ In "/media/bill/HOWELL_BASE/Qnial/MY_NDFS/strings.ndf" find_strings IS OPERATION Substrs Str { Position := EACH first Substrs EACHLEFT findall Str ; or ( Substrs EACHBOTH in ( EACH tally Substrs EACHBOTH EACHRIGHT take ( Position EACHLEFT EACHLEFT drop Str ) ) ) } test : a := 'There is a big bird flying to bird land for the birds' find_strings email_line_exclude 'gonda@me.his.u-fukui.ac.jp' o >> That's OK gage shape 'gonda@me.his.u-fukui.ac.jp' +---+ Put in this write statement to see which flags fail : f_test := open '/media/bill/Midas/a_INNS Lexicom email server/Bad address processing/test.txt' "w ; ... writefile f_test (link line ' ' (tostring flag_length) (tostring flag_dash) (tostring flag_ampersand) (tostring flag_start) ) ; ... close f_test ; type : badass >> WOW! It's flag_dash causing the problem - Change : IF (~= null (hyph_count := find_string '-' line)) To : IF (~= null (hyph_count := (gage shape (find_string '-' line)))) >> type lq_emailESC badass >> OK - it seems to work NOW!!! # enddoc