projMajr

Howell - Anthony Fauci's corona virus emails, code for text [rendition, analysis]

Classical wisdom : "... If it ain't broke, don't fix it. ..."
My practice : "... If it ain't broke, break it. ..."
Trust me(???!!!) : Don't try this at home, leave it to the idiots...

Key [results, comments]
Play with the [time, mind]-bending perspective yourself
Ratio of actual to semi-log detrended data : [advantages, disadvantages]
Future potential work
Comparison of [TradingView, Yahoo finance] data
[data, software] cart [description, links]

Summary

Quick & Dirty software was thrown together 25May-30Jun2021 to convert emails in a pdf file to text format, suitable for import into email applications (eg [Thunderbird, Evolution, etc]) that use mbox format. The initial date precedes the date when I first found a copy of the Fauci emails pdf file online.

The main challenge was to address seemingly random [insert, delete]ions of spaces throughout the transformed text. The Adobe Acrobat software probably already avoids this problem altogether. So you might ask : How could I [be so stupid, spend so much time] by doing something so useless? Please let me know when you figure out the answer - I could use the help.

The next stage software development, to [process, analyze] the emails, has only been touched on, and will be posted at a later stage of development.

Background : Informed Consent Action Network (ICAN)

Informed Consent Action Network (I had never heard of them before) :

"... ICAN OBTAINS OVER 3,000 PAGES OF TONY FAUCI’S EMAILS
Jun 04, 2021, 03:53ET
Last year, ICAN made FOIA requests to NIH for documents regarding COVID-19, including two requests for Anthony Fauci’s emails. ICAN has received nearly 3,000 emails sent by Fauci from early February 2020 through May 2020. Read what Fauci was saying privately about masks, therapeutics, vaccines, ventilators, and many other COVID-19 topics.

On April 10, 2020 and May 5, 2020, respectively, ICAN submitted the following two FOIA requests:

All emails sent by Anthony Fauci between November 1, 2019 and the present that include the term Moderna or mRNA-1273 in any portion of the email.
All emails sent by Anthony Fauci between November 1, 2019 and the present that include the terms SARS-CoV, COVID, COVID-19, or coronavirus in any portion of the email.
When NIH failed to respond to those requests, ICAN brought a lawsuit against the agency on June 29, 2020. In response, NIH agreed to produce Fauci’s emails on a rolling basis. To date, we have received 2,957 pages of Fauci’s sent emails dated between early February 2020 through May 2020 and will continue to receive email productions on a rolling basis.
Read Fauci’s emails here and a few highlights from these emails are outlined below: ..."
(Howell : I haven't listed them here. Just click the link)

Of course, many other lists of questions can be found in amateur blogs, and once in a while good comments can be found on alternative online media. Most online material by amateurs is short-of-inspiring, but the "less-than-one-in-ten-thousand" certainly outshine the professionals. Maybe I'm too cynical, but it seems that it's rare that the mainstream media does much beyond catering to the politically-correct beliefs of their [sponsor, subscriber, public mob, university]s. So you can't be lazy : actively seek out good stuff for issues that are important to you, or sit back and absorb the mainstream crap. But most important of all, do your homework, think critically, and don't just take the words of experts.

I found a pdf of the emails ~??Jun2021 in a pdf document that was posted by Jason Leopold. See also the article by Natalie Bettendorf, Jason Leopold 01Jun2021 "Anthony Faucis Emails Reveal The Pressure That Fell On One Man".

Email processing - overall steps

The one-liner descriptions below are taken directly from the "List of operators" of the two key QNial programs for this project. As such, they are an ACTUAL meta-level description of the processing of the pdf email compilation. I admit that this is a horrible excuse for a description, but I've done worse. If I get really excited, I might even do a better job of it.

*********************
loaddefs link d_Qndfs 'emails - convert pdfCompilation to text.ndf' - convert email pdf to text
+-----+
convert pdf of emails to text, initial clean up
pdf_convertTo_txt IS OP pinn pout - convert pdf file to text
pEmails_1stclean_pout IS OP pinn pout -
getHead_from_lines IS OP fout - build email header when From: line has content
getHead_from_blankLines IS OP finn fout - build email header when From: line is empty
pEmails_fixHeads_pout IS OP pinn pout - produce unified, coheherent email headers
pEmails_fixBodys_pout IS OP pinn pout - clean up [mis-spaced words, junk-infested lines]
+-----+
reduce word corruption due to [add,removed] spaces by pdftotext, add intro
nDat_indxsSumToNdat_get_ij IS OP numL num - returns indices of numL, sum(numL@(i j)) = num
nDat_indxsSumToNdat_get_ij_test IS - res ipsa loquitor
pEml_addIntro_pout IS OP peml pout - add introdution to pEmailsRaw
+-----+
Extract contentTypes from p_emailsClean
pEmails_get_pContacts IS OP pinn pout sedExpr title introL - sorted list of [To,From,CC] contacts
pEmails_get_pSubjects IS OP pinn pout sedExpr title introL - extract [To,From,CC] subjects
pEmails_saveTo_dMboxDirsPaths IS OP pinn dout - extract each email as a separate file in dEmails
assumes that dEmails already exists - overwrites what is already there
+-----+
Do it all...
pEmails_doALL IS - res ipsa loquitor

*********************
loaddefs link d_Qndfs 'dictionaries.ndf'
+-----+
[create, process] dictionaries
urlL_make_pDic IS OP urlL pdic - create a wordL from a list of urls
pdwd_pdic_extract_pDif IS OP pdwd pdic pDif - create a specialized dictionary not in old dictionaries
pDicInn_removeApoLines_pDicOut IS OP pDicInn pDicOut - remove lines with apostrophes (apos)
pDicL_catSortUnique_pDic IS OP pDicL pdic -
+-----+
repair p_text using dictionaries (eg pdf files)
pdic_pWrdSorted_merge_pDicWrd IS OP pdic pWrdSorted - [merge, sort] prepTo fix [split, merged] words
fragL_subFragL_getNonNull IS OP nFrag_fragL_subFragL - file read of variables
pTxt_pDic_extract_pfrag_pFragSubs IS OP ptxt pdic pfrg psub -
pClean_replace_pFragL_pSubFragL IS OP pFragL pSubFragL - file read of variables

*********************
The operator "" listed above is worth looking at in more detail. Write statements (output to the terminal screen) help here :
write link timestamp_YYMMDD_HMS '-> generate a sorted list of wordFrags
write link timestamp_YYMMDD_HMS '-> merge pdic and pwrd, sort to pdwd
write link timestamp_YYMMDD_HMS '-> read pdwd, - write [, sub]FragL to prawe
write link timestamp_YYMMDD_HMS '-> read praw, build rawe[, Sup]FragL
write link timestamp_YYMMDD_HMS '-> "invert" raweSupFragL to findSubFragL
write link timestamp_YYMMDD_HMS '-> "right lengthPairs"
write link timestamp_YYMMDD_HMS '-> strL_write_pout [, sub]FragL
write link timestamp_YYMMDD_HMS '-> strL_write_pout

Proper word lists

(slang term : dictionaries, I shouldn't use this phrase)

Proper word lists play an important part of the text repair process.

pTxt_pDic_extract_pfrag_pFragSubs
I need a list of common [begin,end]ings of words
[adjective, noun], [adverb, verb], [conjunctive, prepositions]

Clean dictionaries --+-----------------------+---------------------------
| | | |
| ----------------->- recombine |
| | | | |
V | | | |
diff---> frags ----->+---->+ breakup | |
^ | | |
| | | |
Dirty word list -----| V V |
fixBlends fixSplits |
V
Manual collection of [good, new] words ----------->+-----------------> diff--> vaccine wordList
^
|
|
[url, document] sources of vaccine words -----------

diff :
- perhaps cut off <2 chrs?
- can always search smallFrags later

[url, document] sources of vaccine words, 28Jun2021 starting point :
CDC ???

Software [language, utility, tool, header, program]s

A hybrid environment was used to write the programs :

Unix [command, utility]s (eg [grep, sed, wc]) do the lion's share of the work. The links are to my simple summaries of options an features that I often use, but I provide minimal explanation. More extensive notes provide links to some great blogs on various features, and can be found in the same "bin" directory.
QNial programming language provides the overall scture of the programs, acts as a wrapper for the Unix stuff, but also does some key processing. QNial she was the centerfold-of-the-month in Computer Language magazine in ?~Mar1994?. I fell in love, have left her several times, but always found my way back.

I've extracted text from pdf files many times. Perhaps the biggest challenge I've done is to auto the editing of thousands of pdf files (keeping them in pdf format) for either the "Word Congress on Computational Intelligence (WCCI)" or the "International Joint Conference on Neural Networks (IJCNN)" (see Authors' Guide and software). As usual, many pdf files had to be iteratively corrected with the authors' feedback. Nobdy can be expected to get all of the formats right, and they don't have some of the required material (eg copyrights, etc).

'emails - convert pdfCompilation to text.ndf' - convert email pdf to text
'email analysis - Fauci corona virus header.ndf'
'dictionaries.ndf' - here, dictionaries are actually sorted lists of words

Several operators (same as [function, procedure]s in other computer languages, short term I use is "optr") were specifically developed for this project, but they were put into generic form in other files for use in other projects, including :

strings.ndf :

strL_selectOddNum_subStrL IS OP strL - useful for inclusion in a list of strL for [terminal, file] output
str_subStrs_getLenMatches_subStrPairs IS OP str subStrs - return (str = link subStrPairs)
strL_eachQuoted_strOut IS OP strL - convert strL to a single string of [quote, space]d subStrs
usable in QNial expressions for host commands
example : ('bear' 'bull' 'pig' 'wolf') -> '"bear" "bull" "pig" "wolf"'
strL_to_strExecuteMirror IS OP strL - convert strL to a self-return executable (eg for writefile)
listOfStrL_to_strExecuteMirror IS OP listOfStrL - convert listOfStrL to a self-return executable (eg for writefile)

fileops.ndf :

pinn_readExecuteLines_aryL IS OP path - make a list of the results of executing each line of pinn "apo-enclose" strings unless they are [variable, operator, transformer, etc] on each line
strL_write_fout IS OP strL fout - write a list of strs, apo-enclosed separated by spaces, as one output line to fout (not a path!), WARNING: assumes length(strL) % max strLength for fileInput
listOfStrL_write_fout IS OP listOfStrL fout - write a list of strPairs, as one output line to fout (not a path!), WARNING: assumes length(strL) % max strLength for fileInput

'QNial setup.ndf'
My core QNial (.ndf) library files are, as usual, used extensively :

'strings.ndf' - a link is provided above
'file_ops.ndf' - a link is provided above

Notes taken during development My notes are of not much use to anyone else, other than as proof of my stumbling in the process. Still, rare parts of it may help others.

[small, fun] stuff I've learned

sed expression wrapping for [view, correct, remind]ing

As an [old, fat, bald] guy, my [eye, finger, ear]s are fine, its just what's between them that is failing. Short [grep, sed] expressions are like buttering a slice of bread, long expressions are like being lost in acreas of thorn bushes. It was a relieve to put in the tiny effort to make them more [read, edit]-able. Here are two examples :

#] sed_getContacts - reformat email headers [Date, To, From, Cc, Subject], pEmails_get_pContacts

tbl :=
'extract lines of interest'.......'s/^From: //I;s/^To: //I;s/^Cc: //I'
'get rid of multiple spaces'.....';s/[ ]+/ /g'
'get rid of spaces within ()'....';s/((.*) (.*))/(12)/g'
'get rid of apos'.................(link ';s/' chr_apo '//g')
'get rid of quotes'..............';s/"//g'
'problematic lineStart1'.........';s/^ //g'
'problematic lineStart2'.........';s/^.+//g'
'firstname tighten'..............';s/ , /, /g'
'lastname tighten'..............';s/, ((/, (/g'
'delete title for alphaSort'.....';s/Dr. //'
;

n_cols := 2 ; n_rows := (floor ((gage shape tbl) / 2)) ;
sed_getContacts := link second cols (n_rows n_cols reshape tbl) ;

#] organization acronyms, pdftotext
# frequent mis-recognitions : (=[({] )=[)}]

sed_orgAcronym :=
'(NIH/CC/DLM)'....';s/[({]N[1IJlTf]H/CC/[D0]LM[)}]/(NIH/CC/DLM)/'
'(NIH/FIC)'.......';s/[({]N[1IJlTf]H/F[1IJlTf]C[)}]/(NIH/FIC)/'
'(NIH/NCI)'.......';s/[({]N[1IJlTf]H/NC[1IJlTf][)}]/(NIH/NCI)/'
'(NIH/OD)'........';s/[({]N[1IJlTf]H/[O0][D0][)}]/(NIH/OD)/'
'(NIH/VRC)'.......';s/[({]N[1IJlTf]H/VRC[)}]/(NIH/VRC)/'
'(CDC/DDID/NCIRD/OD)' ';s/[({]C[D0]C/[D0][D0][I1][D0]/NCIRD/[O0][D0][)}]/(CDC/DDID/NCIRD/OD)/'
'(CDC/OD)'........';s/[({]C[D0]C/[O0][D0][)}]/(CDC/OD)/'
'(OS/IOS)'........';s/[({][O0]S/[1IJlTf]0S[)}]/(OS/IOS)/'
'(OS/ASPR/IO)'....';s/[({][O0]S/ASPR/[1IJlTf][O0][)}]/(OS/ASPR/IO)/'
;
;

n_cols := 2 ; n_rows := (floor ((gage shape sed_orgAcronym) / 2)) ;
sed_orgAcronym := second cols (n_rows n_cols reshape sed_orgAcronym) ;

Note that the periods sequences in the middle are spaces in the QNial program code, but are put there for easy html alignment (not the right approach, but fast).

This combines sed expressions :

#] sed_pdftotext reformat, pdftotext

sed_pdftotext := link sed_format1 sed_formatHeads sed_orgAcronym ;

sed_pdftotext is used as the argument "sedExpr" in the QNial operator (function, procedure equivalent) :
pPdf_convertTo_pTxt IS OP pinn pout sedExpr
This is in the form of a host call from QNial, my usual simple way of hybridizing [QNial, bash] :
host link 'pdftotext "' pinn '" "' p_temp '"' ;
host link 'sed "' sedExpr '" "' p_temp '" >"' pout '" ' ;

Inversion of 1-level-nested lists of strings

OK, so maybe this is an ancient achievement of computer science, with jillions of interesting optimisations, but it is interesting for me. I have blindly applied brute force approaches in 'dictionaries.ndf' -> pTxt_pDic_extract_pfrag_pFragSubs. This project "inverts" for a list of words [merge, sort]ed from words in the text from the pdf, and a standard Linux american-english dictionary (list of words). The challenge is simple on the surface, but for large enough text files, dictionaries] could become a challenge. I suspect this may be "order of N-squared" (O(N^2), where N is the number of data).

This reminds me of Moore-Penrose matrix inversion, a challenge for large datasets with Guangbin Huangg's "Extreme Learning Machines" (ELMs).

Software and tools

GNU Unix [commands, utilities, bash scripts] :
The lion's share of the work is doen by very basic, and fairlymple (but easy-to-screw-up) Unix tools. There is no way that I could code at this level! These are the first things that children should be taught right after they learn to [read, write, type].
- pdftotext - popular Linux utility to transform pdf documents to text-only format
- grep - global regular expression parsing, used frequently in combion wth [sed, find] below. This is a foundational tool across [U, Li]nix. I reqlly wish I had spent the time to learn it properly decades ago, and I have a long way to go.
- sed - I probably do at least 2 orders of magnitude more editing with sed than with all [text editors, word processors, spreadsheets, etc, etc] combined.
- find - it feels weird, but I actually don't make use of "find" for this super-simple project, as it doesn't deal with many thousands of files.
- no bash scripts? - Again, this feels very weird, but then again there are several simple one-liner bash sequences wrapped in QNial operators.
- ???
Most of my other bash scripts are listed here (check sub-directories).
QNial programming language - Queen's University of Kingston, Ontario, Canada, "Q'Nial Nested Interactive Array Language" is my top prefered programming language for modestly complex to insane programming challenges, along with perhaps 3 other people in the world (if they are still alive).
Most of my other Q'Nial programs are listed in this directory (check sub-directories as well). Here's a simple list of many, if not most, of my Q'Nial operator libraries, showing their purpose.
gimp (GNU image manipulation program) is what I used for the SP500 time-section transparencies. For more details, see the section above "Play with the [time, mind]-bending perspective yourself".
- Tesla options pricing is an example of a gnuplot script. I won't be doing that for this project un[less, til] I complete the "Calculus of Words" work.
- ???
- ???
- ???
- ???
- ???
- ???
Other toolsets

Future potential work

The calculus of words
The fractional order calculus of words - Harken back to the Great War between Gottfried Liebniz and Isaac Newton over who invented calculus, and proceed from there. Finally, >300 years later, things are starting to happen!
Geometrical Deep Learning neural networks (eg drug discovery) - Michael Bronstein's plenary talk for the 2020 World Congress on Computational Intelligence was an inspiring insight.
Robert Hecht-Nielson, Confabulation Theory versus Bayes Theorem
Lee Giles, CiteSeer - full of ideas on how to build systems.
Tom Cobb, tools for natural language processing

Warning, waiver

WARNING - the pdftotext process produces many errors, especially space-gaps in words.
13Jun2021 This process requires a fair amount of un[certain, finish]ed "cleaning" to get useable results.
PRIVACY concerns : the "names" of email addresses have been removed (name@affliliation)

Waiver/ Disclaimer :
The reformatting of this document does NOT reflect the [past, current, future] cart [policy, priority, direction, opinion]s of [this author, employer, work colleague, family, friends, acquaintance]s. This reformat has NOT been approved nor sanctioned at any level by any person or organization, nor has it been checked for errors. The reader is warned that there IS a [warranty, guarantee] as to the accuracy of the reformatting herein : it sucks! The application of this reformat could quite possibly result in severe losses and/or damages to the [author, reader, associate, organization, country, entire human specie]s. The author accepts no responsibility for damages or loss arising from the application of [any, all] part of this reformat, neither for the reader nor third parties. This webPage is one fruit of my madness, and it would be mad to take it for anything else, and if you did, then who is maddest?