Education

Ph.D. in Electrical and Computer Engineering,

University of Maryland, College Park, MD, May 2003.

Ph.D. Thesis: Probabilistic Methods for Searching OCR-Degraded Arabic Text

Thesis Supervisor: Dr. Douglas W. Oard

Major area: Computer Engineering

Minor area: Microelectronics

GPA: 3.73

 

Masters Degree in Electrical and Computer Engineering,

University of Maryland, College Park, August 1999.

GPA: 3.40

 

Bachelor of Electrical and Computer Engineering,

University of Maryland, College Park, December 1995.

Magna Cum Laude, GPA: 3.88

 

Experience

 Senior Scientist – Qatar Computing Research Institute

Feb. 2011 – Present

  • Defining the priorities of information retrieval and natural language processing research at QCRI
  • Leading and participating in collaborations between QCRI and external entities such as Aljazeera.net and Boeing
  • Conducting productive research in the following:
    • Social search
      • Improving language handling for search using language dependent and language independent methods
      • Performing topic detection and filtering in social media
      • Developing cross media (social media and news) summarization (ex. tweetMogaz.com)
    • Web search
      • Building the infrastructure for an Arabic web search engine including: crawling, distributed indexing, distributed search, etc.
    • Basic natural language processing
      • Developing tools to perform stemming, part of speech tagging, named entity recognition, phrase detection, automatic language detection, Arabizi to Arabic conversion, automatic diacritization, parsing, etc.
    • Social computing
      • Developing technologies for the automated analysis and understanding of social media streams
      • Applying the methodologies to specific case studies including turmoil in Egypt, ISIS sympathizers, Islamophobia, and xenophobia.

 

 

Researcher – Microsoft, Cairo

Mar. 2007 – Feb. 2011

Conducting productive research in the following projects:

  • BookWeb:
    • Exploring digitized content by cross linking topical segments and automatically extracted key phrases to topical segments
    • Automatically identifying table of content pages and linking their entries to appropriate pages
    • Automatic constructing of tables of content by identifying headlines in digitized content
    • Improving search of digitized content by identifying most valuable parts of content
    • Resulting in: ThinkWeek paper; invention disclosure; collaboration with MSRC (Natasa Milic-Frayling’s group); joint TechFest 2008 demo with MSRC; papers in INEX and CIKM workshop.
  • IBIS:
    • Measuring Bing’s (formerly Live) search effectiveness for Arabic
    • Working on improving search effectiveness:
      • Index coverage: supplied a white list of good Arabic URL’s for manual boosting of their static rank
      • Word breaking: designed, tested, and helped code a new Arabic word breaker
    • Resulting in: a ThinkWeek paper; a white list that helped quadruple Bing’s Arabic index; a word-breaker that was checked-in into Bing’s tree
  • Transbulletization:
    • Attempting to trim superfluous parts from sentences that are translated from different languages into English using parsing and statistical summarization techniques without breaking the flow of sentences
    • Integrating concept into a cross-language search application
    • Resulting in: TechFest 2009 demo; invention disclosure
  • Enterprise documenting linking:
    • Attempting to help users navigate intranets by providing contextual links/recommendations based on a user’s:
      • Current context — searching using salient term in context
      • Browsing history — utilizing relevance feedback and filtering
    • Providing related resources such as documentations, people pages, etc.
    • Resulting in: a TechFest 2009 demo; tool to become TechFest 2010 official search/browsing tool; incubation with FuturePoint group
  • Search results diversification:
    • Attempting to identify a query’s different meanings (extrinsic diversity) or different facets (intrinsic diversity) to diversify top search results
    • Using knowledge bases to perform diversification
    • Using density-based clustering to perform diversification
    • Resulting in: paper submitted to ECIR-2010; participation in TREC-2009 relevance feedback track (in collaboration with MSRC’s Stephen Robertson’s group) with an oral presentation at TREC; collaboration with PAMI group in University of Waterloo (Dr. Mohamed Kamel)
  • OCRless retrieval:
    • Language independent searching of document images without performing OCR by clustering similar connected components and rendering queries into images
    • Applying IR techniques such as weighted structured queries to improve retrieval effectiveness
    • Applying density based clustering to improve clustering
    • Resulting in: TechFest 2009 demo; invention disclosure; SPIRE-2009 paper
  • Machine translation:
    • Developing Arabic language handling to improve Ar <=> En MT
    • Designing and testing an Arabic word breaker for Ar => En MT to improve lexical coverage
    • Helping design and test Arabic reverse word breaker for En => Ar MT
    • Performing acronym expansion to aid En => any_language MT
    • Detecting named entities as targets for transliteration
    • Transliterating Ar <=> En named entities
    • Resulting in: Tech transfer of Arabic word breaker in MSR MT system; reverse word breaker tech transfer expected in Nov. 2009
  • Bing instant answers:
    • Performing cross language search to improve image search for Arabic by overcoming the lack of Arabic meta-information about images
      • Arabic queries translated into English and results are shown to user
    • Translating English instant answers automatically into Arabic
    • Translating English Wikipedia Info-boxes into Arabic to aid in on Arabic question answering based on knowledge bases

Resulting in: 2 ThinkWeek papers; development of instant answers in progress

 

 

Researcher – IBM, Cairo

Mar. 2005 – Mar. 2007

  • Performing productive research and development in the following areas:
    • Arabic OCR degraded text retrieval
      • Incorporating character, word, stem, and stem-template based language modeling to improve error correction
      • Adapting blind relevance to OCR degraded documents
      • Investigating/developing techniques for developing relevance judgments without pooling for OCR degraded collections
      • Developing degraded Arabic word clustering to improve search term highlighting
    • Machine/Human Assisted Human/Machine translation
      • Using the output of a machine translation system to build localized language models to improve type-ahead for translators
      • Incorporating speech recognition technology with existing machine translation technology to speed-up human translation
    • Information extraction from biomedical text
      • Employing unsupervised learning techniques to infer patterns that contain relationships/interactions between biomedical named entities
      • Applying inferred models in extracting protein-protein interaction
    • Adaptive cross-language text filtering
  • Team leader, InfoMind Project
    • Integrating MT, IE, IR, adaptive filtering, and information visualization
    • Managing research engineers in varying parts of the projects

 

 

Associate Professor – Cairo University, Cairo

Aug. 2005 – Aug. 2016

Information Systems Department, Faculty of Computer Science and Informatics

· Designing and teaching of Courses

o Data structures

o Unstructured document retrieval

  • Managing teaching assistants
  • Supervising research assistants in
    • Rapid development of IR test collection
    • Arabic-Hebrew cross language retrieval
    • Employing Arabic morphological analysis in word clustering and highlighting of search results
    • Interactive query expansion
    • Automatic web page structural analysis
    • Wikipedia named entity tagging
  • Supervising senior graduation projects: natural language question authoring; multi-lingual desktop search; affect resolution; browser history caching and search; Arabic named entity recognition

Egyptian Ministry of Communication and Information Technology Research Center of Excellent (co-PI)

  • Developing a web portal for aggregating Arabic web news (www.alzoa.com)
    • Deploying state-of-the-art Arabic text search
    • Using automatic document clustering
    • Investigating automatic Arabic phrase extraction for summarization purposes
    • Investigating ways to anonymously customize web pages to suite specific users
    • Exploring ways to make news interactive and solicit user input

 

 

Lecturer – German University in Cairo

Jan. 2004 – Aug. 2005

Department of Information Engineering and Technology

  • Designing and teaching introductory Electrical Engineering Courses
    • Basic Circuit Theory
    • Electric Circuits Lab
    • Digital Logic Design
  • Managing teaching and research assistants
  • Performing research in Arabic information retrieval
    • Comparing different word stemming and clustering techniques for Arabic information retrieval
    • Devising new methods for rapid construction of test collections for monolingual and cross-lingual retrieval
  • Performing research in Bioinformatics
    • Evaluating the effectiveness of controlled vocabularies in information retrieval
    • Designing automatic methods for assigning controlled vocabulary entries to documents
    • Participating in the 2004 Text REtrieval Conference (TREC) Genomics Track (Ranked 3rd in the Adhoc Retrieval task and 6th in the Triage task)
  • Collaborating with the Library of Alexandria in the Million Book Project
    • Introduction and evaluation of information retrieval technology to scanned and OCR’ed Arabic books
    • Exploration and evaluation of error tolerant Arabic word clustering and morphology techniques

 

Senior Consultant – KEVRIC, Silver Spring, MD

Jun. 2003 – Feb. 2004

Knowledge Management project – under contract from the National Institutes of Health (NIH)

  • Served as Principal Investigator (PI) on the project to evaluate emerging knowledge management technologies intended to facilitate NIH’s grant review process
  • Identifying the needs of scientific review administrators tasked with routing incoming grant proposals to appropriate reviewers
  • Researching existing technologies and developing alternative ones to address the stated needs
  • Establishing criteria for evaluating potentially viable technologies
  • Evaluating the usability and effectiveness of the different technologies

The BISC project – under contract from NIH

  • Served as a research scientist to facilitate the adoption and integration of varying biomedical ontologies intended to support clinical research applications
  • Migrating different biomedical ontologies in varying formats (such DAML and OWL) to standardized formats to facilitate their integration
  • Designing applications that rely on the ontologies and defining insertion points for integrating the ontologies into the applications
  • Working on interfacing applications and the ontologies using API’s