Prof. Meng - Emrecan Tarakci: Difference between revisions
No edit summary (change visibility) |
No edit summary (change visibility) |
||
| (9 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Introduction == |
== Introduction == |
||
After the discussion and the agreement of the project with Professor Weiyi Meng, |
After the discussion and the agreement of the project with Professor Weiyi Meng, |
||
I - Emrecan Tarakci- have started working on the project known as Publication |
I - Emrecan Tarakci- have started working on the project known as Publication |
||
Analysis on Google Scholar. This project was for meeting the demand of Senior |
Analysis on Google Scholar. This project was for meeting the demand of Senior |
||
Project I & II courses. Program for Publication Analysis on Google Scholar |
Project I & II courses. Program for Publication Analysis on Google Scholar |
||
| Line 16: | Line 16: | ||
non-self-citation, the average H-index, the average H-index based on |
non-self-citation, the average H-index, the average H-index based on |
||
non-self-citation, the ratio of total non-self-citation over the total citation. |
non-self-citation, the ratio of total non-self-citation over the total citation. |
||
== WCPMS Context Diagram == |
|||
== Technical Details == |
|||
[[File:Wcpmscontextdiagram.JPG]] |
|||
The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management. |
|||
== |
== Project Requirements == |
||
This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages. |
|||
WCPMS-2 shall provide for entry of project IDs, names, descriptions, sponsors, clients, advisors, assigned students, and allocated funds. - {WCP99-001} |
|||
Stage 1: Extract basic publication and citation information for a given faculty |
|||
WCPMS-2 shall provide for the generation of emails with fields populated from the WCPMS database. - {WCP99-002} |
|||
Input: The URL of the Google Scholar profile page of a faculty |
|||
Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file. |
|||
Requirement: Minimize the number of query submissions/downloads from Google Scholar site. |
|||
Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation. |
|||
WCPMS-2 shall provide for the generation of reports with fields populated from the WCPMS database. - {WCP99-003} |
|||
Input: A given publication P and the URL L of the (first) page that contains the publications that cite P. |
|||
Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author. |
|||
Requirement: Minimize the number of query submissions/downloads from Google Scholar site. |
|||
Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site. |
|||
WCPMS-2 shall provide for the generation of spreadsheets with fields populated from the WCPMS database. - {WCP99-004} |
|||
Input: The URL of the Google Scholar profile page of a faculty |
|||
Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result. |
|||
Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation. |
|||
WCPMS-2 shall be installed on a new WCP.Binghamton.edu server running CentOS7. - {WCP99-005} |
|||
Input: The output of Stage 3. |
|||
Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation. |
|||
Stage 5: Divide the publication records of a given faculty by year. |
|||
The WCPMS shall maintain compatibility with the BU Central Authorization System. - {WCP99-006} |
|||
Input: The output of Stage 1. |
|||
Output: Divide the input by year with the publications for more recent years listed first. |
|||
Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation. |
|||
The WCPMS should maintain growth provisions for tracking lab equipment, computers, and workspace assignments. - {WCP99-007} |
|||
== Weekly Progress == |
|||
An SQL script shall be provided that creates and initializes the WCPMS database. - {WCP99-008} |
|||
Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase. |
|||
System shall have two types of accounts, Students and Professor/TAs. - {WCP99-009} |
|||
=== Week 1 & 2 & 3 & 4 === |
|||
System shall distinguish users according to their BU ID (Professor/Student). - {WCP99-010} |
|||
Working on Phase 3 |
|||
Students shall be able to input their personal information (BU ID, name, major, GPA, e-mail, phone number etc.) - {WCP99-011} |
|||
Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban) |
|||
Students shall be able to input their skills and abilities. - {WCP99-012} |
|||
=== Week 5 & 6 & 7 === |
|||
Students shall be able to input their courses. - {WCP99-013} |
|||
Working on Phase 4 |
|||
Students should be able to input their current schedules as spreadsheets (.xlsx, excel format). - {WCP99-014} |
|||
=== Week 8 & 9 === |
|||
Students should be able to view the projects that they are assigned to. - {WCP99-015} |
|||
Working on Phase 5 |
|||
Students should be able to view the due dates of their projects. - {WCP99-016} |
|||
=== Week 10 & 11 & 12 === |
|||
Professors shall be able to input their personal information (name, title, department, e-mail, phone number) - {WCP99-017} |
|||
Working on Phase 6 |
|||
Professors shall be able to add/edit/delete project descriptions. - {WCP99-018} |
|||
== Charts == |
|||
Professors shall be able to access the spreadsheets filled by students. - {WCP99-019} |
|||
=== First Semester === |
|||
Professors shall be able to assign students to projects. - {WCP99-020} |
|||
[[File:Screen_Shot_2014-12-16_at_11.35.59_PM1.png]] |
|||
=== Second Semester === |
|||
Professors shall be able to generate e-mails to students from the website. - {WCP99-021} |
|||
[[File:Screen_Shot_2014-12-17_at_12.37.20_AM.png]] |
|||
The connection with the database, the SQL Server, shall be provided through Apache via the written PHP codes. {WCP99-022} |
|||
The inputs (inputs entered by the users) shall be sent to the SQL Server over Apache, and the corresponding tables, lists or any kind of information shall be returned as outputs. {WCP99-023} |
|||
System shall have two types of accounts as mentioned in WCP99-009, Students and Professor/TAs. |
|||
System shall distinguish users according to their BU ID (Professor/Student) as mentioned in WCP99-009. |
|||
Students shall not have access to add/edit/delete projects created by professors. - {WCP99-024} |
|||
Students shall not have access to other students’ or professors’ personal information. - {WCP99-025} |
|||
== User Interface == |
|||
'''Student Side of the Website: ''' |
|||
Since we are building this website for the University we tried to choose the colors and styles that are similar to the current University services such as BU Brain and Blackboard. |
|||
[[File:StudentHome.jpg]] |
|||
This is the Home page that both the students and professors see when they login to the system. They can view their personal information and view the announcements. |
|||
[[File:Skill_Assesment.jpg]] |
|||
Here, the students are able to upload/update their personal information, grades and technical skills. |
|||
[[File:Students_Projects.jpg]] |
|||
This page shows a list of the current projects where the students are able to view the project details and apply to up to 4 preferences. |
|||
[[File:Personal_Statement.jpg]] |
|||
If the students wish to enter a personal statement to emphasize on a skill or anything else, this is where they do so. |
|||
'''Professor/Admin Side of the Website: ''' |
|||
[[File:Create_Project.jpg]] |
|||
The professors enter the project details as well as uploading the project proposal in any type of format that is supported by Google Docs. |
|||
[[File:Current_Project.jpg]] |
|||
This page is similar to the Projects page of the Student side. It shows the details of active projects. |
|||
[[File:Students.jpg]] |
|||
Here, the professors are able to view students and all of their information they uploaded; their personal statements, preferences, grades, skills etc. |
|||
[[File:Project_Assignment1.jpg]] |
|||
[[File:Project_Assignment2.jpg]] |
|||
This is the page where the students get assigned to the projects. A project is selected from the upper list and the project details is shown on the left whereas the students who applied to the project and their information is shown on the right. If needed, a list of unassigned students is also available. |
|||
[[File:Assignment_Grid.jpg]] |
|||
The purpose of this page is basically the same as the Project Assignment page, with a spreadsheet view allowing the user to work on multiple projects at a time. |
|||
[[File:Admin_Panel.jpg]] |
|||
Here the user has the option of creating new announcements, enabling/disabling the project application for students, upload a full list of students to the database and generate excel reports of lists of students and reports |
|||
[[File:Students_Report.jpg]] |
|||
The Students Report that is generated from the Admin Panel that has the students information. |
|||
[[File:Projects_Report.jpg]] |
|||
The Projects Report that is generated from the Admin Panel that has the projects information. |
|||
After the discussion and the agreement of the project with Professor Weiyi Meng, |
|||
I - Emrecan Tarakci- have started working on the project known as Publication |
|||
Analysis on Google Scholar. This project was for meeting the demand of Senior |
|||
Project I & II courses. Program for Publication Analysis on Google Scholar |
|||
mainly focuses on extracting the records from Google Scholar such as name of |
|||
author/s, title of paper, year of publication, publication venue and citation |
|||
count. After extraction and storing, the program analyzes and computes the |
|||
count of self-citations, non-self-citations, i10-index, H-index and the number |
|||
of academician's publications per year. Since the program will work for the |
|||
Watson faculty members at first, the program computes the total citation count, |
|||
the total non-self-citation count, the average citation count, the average |
|||
non-self-citation count, the total i10-index, the total i10-index based on |
|||
non-self-citation, the average i10-index, the average i10-index based on |
|||
non-self-citation, the average H-index, the average H-index based on |
|||
non-self-citation, the ratio of total non-self-citation over the total citation. |
|||
Latest revision as of 21:27, 2 May 2015
Introduction
After the discussion and the agreement of the project with Professor Weiyi Meng, I - Emrecan Tarakci- have started working on the project known as Publication Analysis on Google Scholar. This project was for meeting the demand of Senior Project I & II courses. Program for Publication Analysis on Google Scholar mainly focuses on extracting the records from Google Scholar such as name of author/s, title of paper, year of publication, publication venue and citation count. After extraction and storing, the program analyzes and computes the count of self-citations, non-self-citations, i10-index, H-index and the number of academician's publications per year. Since the program will work for the Watson faculty members at first, the program computes the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.
Technical Details
The development environment of the program is Visual Studio 2013 for Desktop. As a programming language I used C#. Also, Microsoft SQL Server is used for Database Management.
Project Requirements
This project is to extract and analyze the publications and citations of university faculty based on the Google Scholar pages.
Stage 1: Extract basic publication and citation information for a given faculty Input: The URL of the Google Scholar profile page of a faculty Outputs: Extract every individual publication of the faculty. For each publication, extract its title, authors, publication venue (conference name for conference publication publications, journal name and volume/issue numbers for journal publications), page numbers, publication year, citation count, and citation link. Individual author names should be separated. The citation link is the URL that links to the (first) page that contains the publications that cite the publication under consideration. The extracted records are exported to an XML file and Excel file. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.
Stage 2: Extract information of all publications that cite a given publication P and determine whether a citation is a self-citation. Input: A given publication P and the URL L of the (first) page that contains the publications that cite P. Output: Compute the number of self-citations and non-self-citations for P among the publications that cite P. A publication p1 that cites P is a self-citation if p1 and P share at least one author. Requirement: Minimize the number of query submissions/downloads from Google Scholar site.
Stage 3: Combine Stage 1 and Stage 2 programs to find the non-self-citation count for every publication of a given faculty from the Google Scholar site. Input: The URL of the Google Scholar profile page of a faculty Output: The same as for Stage 1 except that the non-self-citation count for each publication is added to the result.
Stage 4: Compute the i10-index (the number of publications that have at least 10 citations) and H-index (the largest number h such that there are h papers with each having at least h citations) based on both the total citation and non-self-citation. Also compute the ratio of non-self-citation over the total citation. Input: The output of Stage 3. Output: The i10-index based on the total citation, the i10-index based on non-self-citation, the H-index based on the total citation, the H-index based on non-self-citation, the ratio of non-self-citation over the total citation.
Stage 5: Divide the publication records of a given faculty by year. Input: The output of Stage 1. Output: Divide the input by year with the publications for more recent years listed first.
Stage 6: For the list of Google Scholar faculty profiles, compute the total citation count, the total non-self-citation count, the average citation count, the average non-self-citation count, the total i10-index, the total i10-index based on non-self-citation, the average i10-index, the average i10-index based on non-self-citation, the average H-index, the average H-index based on non-self-citation, the ratio of total non-self-citation over the total citation.
Weekly Progress
Since, I have done first two phases during first semester, at the beginning of the spring semester I started with third phase.
Week 1 & 2 & 3 & 4
Working on Phase 3
Major difficulties were, sending multiple requests to Google's server and as a result being banned by Google (it is basically local IP ban)
Week 5 & 6 & 7
Working on Phase 4
Week 8 & 9
Working on Phase 5
Week 10 & 11 & 12
Working on Phase 6

