Education
Ph.D. in Electrical and Computer Engineering,
University of Maryland, College Park, MD, May 2003.
Ph.D. Thesis: Probabilistic Methods for Searching OCR-Degraded Arabic Text
Thesis Supervisor: Dr. Douglas W. Oard
Major area: Computer Engineering
Minor area: Microelectronics
GPA: 3.73
Masters Degree in Electrical and Computer Engineering,
University of Maryland, College Park, August 1999.
GPA: 3.40
Bachelor of Electrical and Computer Engineering,
University of Maryland, College Park, December 1995.
Magna Cum Laude, GPA: 3.88
Experience
Senior Scientist – Qatar Computing Research Institute
Feb. 2011 – Present
- Defining the priorities of information retrieval and natural language processing research at QCRI
- Leading and participating in collaborations between QCRI and external entities such as Aljazeera.net and Boeing
- Conducting productive research in the following:
- Social search
- Improving language handling for search using language dependent and language independent methods
- Performing topic detection and filtering in social media
- Developing cross media (social media and news) summarization (ex. tweetMogaz.com)
- Web search
- Building the infrastructure for an Arabic web search engine including: crawling, distributed indexing, distributed search, etc.
- Basic natural language processing
- Developing tools to perform stemming, part of speech tagging, named entity recognition, phrase detection, automatic language detection, Arabizi to Arabic conversion, automatic diacritization, parsing, etc.
- Social computing
- Developing technologies for the automated analysis and understanding of social media streams
- Applying the methodologies to specific case studies including turmoil in Egypt, ISIS sympathizers, Islamophobia, and xenophobia.
- Social search
Researcher – Microsoft, Cairo
Mar. 2007 – Feb. 2011
Conducting productive research in the following projects:
- BookWeb:
- Exploring digitized content by cross linking topical segments and automatically extracted key phrases to topical segments
- Automatically identifying table of content pages and linking their entries to appropriate pages
- Automatic constructing of tables of content by identifying headlines in digitized content
- Improving search of digitized content by identifying most valuable parts of content
- Resulting in: ThinkWeek paper; invention disclosure; collaboration with MSRC (Natasa Milic-Frayling’s group); joint TechFest 2008 demo with MSRC; papers in INEX and CIKM workshop.
- IBIS:
- Measuring Bing’s (formerly Live) search effectiveness for Arabic
- Working on improving search effectiveness:
- Index coverage: supplied a white list of good Arabic URL’s for manual boosting of their static rank
- Word breaking: designed, tested, and helped code a new Arabic word breaker
- Resulting in: a ThinkWeek paper; a white list that helped quadruple Bing’s Arabic index; a word-breaker that was checked-in into Bing’s tree
- Transbulletization:
- Attempting to trim superfluous parts from sentences that are translated from different languages into English using parsing and statistical summarization techniques without breaking the flow of sentences
- Integrating concept into a cross-language search application
- Resulting in: TechFest 2009 demo; invention disclosure
- Enterprise documenting linking:
- Attempting to help users navigate intranets by providing contextual links/recommendations based on a user’s:
- Current context — searching using salient term in context
- Browsing history — utilizing relevance feedback and filtering
- Providing related resources such as documentations, people pages, etc.
- Resulting in: a TechFest 2009 demo; tool to become TechFest 2010 official search/browsing tool; incubation with FuturePoint group
- Attempting to help users navigate intranets by providing contextual links/recommendations based on a user’s:
- Search results diversification:
- Attempting to identify a query’s different meanings (extrinsic diversity) or different facets (intrinsic diversity) to diversify top search results
- Using knowledge bases to perform diversification
- Using density-based clustering to perform diversification
- Resulting in: paper submitted to ECIR-2010; participation in TREC-2009 relevance feedback track (in collaboration with MSRC’s Stephen Robertson’s group) with an oral presentation at TREC; collaboration with PAMI group in University of Waterloo (Dr. Mohamed Kamel)
- OCRless retrieval:
- Language independent searching of document images without performing OCR by clustering similar connected components and rendering queries into images
- Applying IR techniques such as weighted structured queries to improve retrieval effectiveness
- Applying density based clustering to improve clustering
- Resulting in: TechFest 2009 demo; invention disclosure; SPIRE-2009 paper
- Machine translation:
- Developing Arabic language handling to improve Ar <=> En MT
- Designing and testing an Arabic word breaker for Ar => En MT to improve lexical coverage
- Helping design and test Arabic reverse word breaker for En => Ar MT
- Performing acronym expansion to aid En => any_language MT
- Detecting named entities as targets for transliteration
- Transliterating Ar <=> En named entities
- Resulting in: Tech transfer of Arabic word breaker in MSR MT system; reverse word breaker tech transfer expected in Nov. 2009
- Bing instant answers:
- Performing cross language search to improve image search for Arabic by overcoming the lack of Arabic meta-information about images
- Arabic queries translated into English and results are shown to user
- Translating English instant answers automatically into Arabic
- Translating English Wikipedia Info-boxes into Arabic to aid in on Arabic question answering based on knowledge bases
- Performing cross language search to improve image search for Arabic by overcoming the lack of Arabic meta-information about images
Resulting in: 2 ThinkWeek papers; development of instant answers in progress
Researcher – IBM, Cairo
Mar. 2005 – Mar. 2007
- Performing productive research and development in the following areas:
- Arabic OCR degraded text retrieval
- Incorporating character, word, stem, and stem-template based language modeling to improve error correction
- Adapting blind relevance to OCR degraded documents
- Investigating/developing techniques for developing relevance judgments without pooling for OCR degraded collections
- Developing degraded Arabic word clustering to improve search term highlighting
- Machine/Human Assisted Human/Machine translation
- Using the output of a machine translation system to build localized language models to improve type-ahead for translators
- Incorporating speech recognition technology with existing machine translation technology to speed-up human translation
- Information extraction from biomedical text
- Employing unsupervised learning techniques to infer patterns that contain relationships/interactions between biomedical named entities
- Applying inferred models in extracting protein-protein interaction
- Adaptive cross-language text filtering
- Arabic OCR degraded text retrieval
- Team leader, InfoMind Project
- Integrating MT, IE, IR, adaptive filtering, and information visualization
- Managing research engineers in varying parts of the projects
Associate Professor – Cairo University, Cairo
Aug. 2005 – Aug. 2016
Information Systems Department, Faculty of Computer Science and Informatics
· Designing and teaching of Courses
o Data structures
o Unstructured document retrieval
- Managing teaching assistants
- Supervising research assistants in
- Rapid development of IR test collection
- Arabic-Hebrew cross language retrieval
- Employing Arabic morphological analysis in word clustering and highlighting of search results
- Interactive query expansion
- Automatic web page structural analysis
- Wikipedia named entity tagging
- Supervising senior graduation projects: natural language question authoring; multi-lingual desktop search; affect resolution; browser history caching and search; Arabic named entity recognition
Egyptian Ministry of Communication and Information Technology Research Center of Excellent (co-PI)
- Developing a web portal for aggregating Arabic web news (www.alzoa.com)
- Deploying state-of-the-art Arabic text search
- Using automatic document clustering
- Investigating automatic Arabic phrase extraction for summarization purposes
- Investigating ways to anonymously customize web pages to suite specific users
- Exploring ways to make news interactive and solicit user input
Lecturer – German University in Cairo
Jan. 2004 – Aug. 2005
Department of Information Engineering and Technology
- Designing and teaching introductory Electrical Engineering Courses
- Basic Circuit Theory
- Electric Circuits Lab
- Digital Logic Design
- Managing teaching and research assistants
- Performing research in Arabic information retrieval
- Comparing different word stemming and clustering techniques for Arabic information retrieval
- Devising new methods for rapid construction of test collections for monolingual and cross-lingual retrieval
- Performing research in Bioinformatics
- Evaluating the effectiveness of controlled vocabularies in information retrieval
- Designing automatic methods for assigning controlled vocabulary entries to documents
- Participating in the 2004 Text REtrieval Conference (TREC) Genomics Track (Ranked 3rd in the Adhoc Retrieval task and 6th in the Triage task)
- Collaborating with the Library of Alexandria in the Million Book Project
- Introduction and evaluation of information retrieval technology to scanned and OCR’ed Arabic books
- Exploration and evaluation of error tolerant Arabic word clustering and morphology techniques
Senior Consultant – KEVRIC, Silver Spring, MD
Jun. 2003 – Feb. 2004
Knowledge Management project – under contract from the National Institutes of Health (NIH)
- Served as Principal Investigator (PI) on the project to evaluate emerging knowledge management technologies intended to facilitate NIH’s grant review process
- Identifying the needs of scientific review administrators tasked with routing incoming grant proposals to appropriate reviewers
- Researching existing technologies and developing alternative ones to address the stated needs
- Establishing criteria for evaluating potentially viable technologies
- Evaluating the usability and effectiveness of the different technologies
The BISC project – under contract from NIH
- Served as a research scientist to facilitate the adoption and integration of varying biomedical ontologies intended to support clinical research applications
- Migrating different biomedical ontologies in varying formats (such DAML and OWL) to standardized formats to facilitate their integration
- Designing applications that rely on the ontologies and defining insertion points for integrating the ontologies into the applications
- Working on interfacing applications and the ontologies using API’s
